Reporter of genomic methylation and uses thereof

ABSTRACT

In some aspects, described herein is a DNA methylation reporter. In some aspects, the DNA methylation reporter comprises a promoter whose activity can be affected by exogenous methylation changes without being independently regulated by the DNA methylation machinery, operably linked to a DNA sequence that encodes a reporter molecule. In some embodiments the DNA methylation reporter comprises (i) a promoter derived from a mammalian imprinted gene promoter; and (ii) a sequence that encodes a reporter molecule that is detectable in individual mammalian cells, wherein the promoter is operably linked to the sequence that encodes the reporter molecule. Also described are nucleic acids that comprise the DNA methylation reporter, cells that have the DNA methylation reporter integrated into their genome, and non-human mammals comprising cells that have the DNA methylation reporter integrated into their genome. Also described are methods of measuring DNA methylation of a region of interest located in proximity to the DNA methylation reporter in the genome of a cell by detecting the reporter molecule.

RELATED APPLICATIONS

This application claims the benefit of U.S. Provisional Application No.62/137,110, filed Mar. 23, 2015, U.S. Provisional Application No.62/138,888, filed Mar. 26, 2015, and U.S. Provisional Application No.62/139,611, filed Mar. 27, 2015. The entire teachings of the aboveapplications are incorporated herein by reference.

GOVERNMENT SUPPORT

This invention was made with government support under grant number HD045022 awarded by the National Institutes of Health. The Government hascertain rights in the invention.

BACKGROUND

DNA methylation is recognized as a principal contributor to thestability of gene expression in development and to the maintenance ofcellular identity (Bird, 2002; Cedar and Bergman, 2012; Jaenisch andBird, 2003; Reik et al., 2001). A variety of methods for measuring DNAmethylation are available. These include digestion of DNA withmethylation-sensitive restriction enzymes, affinity-based enrichment andsequencing of DNA fragments containing methylated cytosine, and chemicalconversion methods. A widely used chemical conversion method relies onthe fact that treatment of DNA with bisulfite converts cytosine touracil but leaves 5-methylcytosine intact. Thus, 5-methylcytosinepatterns can be mapped by treating DNA with bisulfite, followed bysequencing. Microarray analysis (e.g., using the Illumina 450K HumanMethylation array) of bisulfite-treated DNA has also been extensivelyused in studying methylation.

Recent advances in sequencing technologies have allowed theestablishment of methylation maps from multiple cell types in both human(Ziller et al., 2013) and mouse (Hon et al., 2013). However, changes inDNA methylation are dynamic, and it is still largely unknown howepigenomic information dictates spatial and temporal gene expressionprograms (Smith and Meissner, 2013).

SUMMARY

In some aspects, described herein are methods that allow tracing ofreal-time changes in DNA methylation in living cells. Methods describedherein couple DNA methylation to a detectable readout, allowingdetection of the methylation state of a region of genomic DNA. Alsodescribed herein are products, e.g., nucleic acid constructs, vectors,cells, and non-human animals of use in the methods. Also describedherein are methods of making the nucleic acid constructs, vectors,cells, and non-human animals. Also described herein are methods ofidentifying an agent that affects the methylation state of a region ofDNA in the genome of a cell.

In some embodiments, the methylation state of a region of DNA in thegenome of a cell is monitored over time, allowing for detection ofchanges in DNA methylation pattern. In some embodiments, methylationstate of a region of DNA in the genome of a cell is detected at leastonce before a cell begins to undergo a change in cell state or a changein cell identity and at least once during the change in cell state orcell identity. In some embodiments, methylation state of a region ofgenomic DNA is detected at least once before a cell begins to undergo achange in cell state and at least once after the cell has undergone achange in cell state or cell identity. In some embodiments, a change inmethylation of a region of genomic DNA that occurs in association with achange in cell state or a change in cell identity is detected. In someembodiments, a cell is exposed to an agent or condition and a change inmethylation of a region of genomic DNA that occurs as a result of agentor condition is detected.

In some aspects, described herein is a nucleic acid comprising: (i) amammalian imprinted gene promoter; and (ii) a sequence that encodes areporter molecule that is detectable in individual mammalian cells,wherein the promoter is operably linked to the sequence that encodes thereporter molecule.

In some aspects, described herein is a nucleic acid comprising: (i) amammalian imprinted gene promoter; and (ii) a sequence that encodes areporter molecule, wherein the promoter is operably linked to thesequence that encodes the reporter molecule, and (iii) a first homologyarm located 5′ from the promoter and a second homology arm located 3′from the sequence that encodes a reporter molecule, wherein the homologyarms comprise sequences that are homologous to sequences that flank atarget location in a mammalian genome. In some embodiments the reportermolecule is one that is detectable in individual mammalian cells.

In some embodiments, the mammalian imprinted gene promoter in a nucleicacid described herein comprises at least a portion of a parent-of-origindifferentially methylated region (DMR). In some embodiments, the nucleicacid further comprises a first homology arm located 5′ from the promoterand a second homology arm located 3′ from the sequence that encodes areporter molecule, wherein the homology arms comprise sequences that arehomologous to sequences that flank a target location in a mammaliangenome. In some embodiments the target location is in proximity to a CpGisland, CpG island shore, superenhancer, enhancer, promoter, or genebody. In some embodiments the CGI is a low density CGI. In someembodiments the CGI is a high density CGI. In some embodiments thetarget location is within a CpG island, CpG island shore, superenhancer,enhancer, promoter, or gene body. In some embodiments the targetlocation is within 10 kb of a transcription start site (TSS). In someembodiments the target location is in proximity to a genetic locus thatis aberrantly methylated in subjects suffering from a disorderassociated with aberrant DNA methylation. For example, in someembodiments the target location is aberrantly hypermethylated oraberrantly hypomethylated in subject suffering from such a disorder.

In some embodiments the imprinted gene promoter is from a gene that isimprinted both in mice and humans. In some embodiments the imprintedgene promoter is from a gene that is imprinted in a species-specificmanner. In some embodiments the imprinted gene promoter is from a geneselected from the group consisting of: Snrpn, Igf2r, Gnas, Igf2, Meg3(Gtl2), Airn, Kenq1ot1, Mest, Grb10, and Peg10. In some embodimentsimprinted gene promoter is from the Snrpn gene. In some embodiments thesequence of the promoter comprises SEQ ID NO: 1 or SEQ ID NO: 2.

In some embodiments, the reporter molecule in a nucleic acid describedherein comprises a fluorescent protein or a luciferase. In someembodiments the fluorescent protein is a green fluorescent protein, redfluorescent protein, or infrared fluorescent protein. In someembodiments the reporter molecule comprises a site-specific recombinase.In some embodiments the site-specific recombinase is Cre. In someembodiments, the nucleic acid further comprises a drug resistance markeror nutritional marker operably linked to a constitutive promoter.

In some embodiments, a nucleic acid comprising (i) a mammalian imprintedgene promoter and (ii) a sequence that encodes a reporter moleculefurther comprises a CpG-rich region, CpG shore, or low CpG region thatis naturally found in a mammalian genome.

In some embodiments, a nucleic acid comprising (i) a mammalian imprintedgene promoter and (ii) a sequence that encodes a reporter moleculefurther comprises a sequence homologous to a regulatory region of amammalian gene. In some embodiments the gene is a cell type specificgene. In some embodiments, the regulatory region comprises a CpG-richregion, CpG shore, or low CpG region.

In some embodiments, a nucleic acid comprising (i) a mammalian imprintedgene promoter and (ii) a sequence that encodes a reporter moleculefurther comprises a STOP cassette that inhibits synthesis of thereporter molecule and is flanked by recombination sites for asite-specific recombinase.

In some aspects, described herein is a vector comprising a nucleic acidcomprising: (i) a mammalian imprinted gene promoter and (ii) a sequencethat encodes a reporter molecule, wherein the promoter is operablylinked to the sequence that encodes the reporter molecule. In someembodiments, the mammalian imprinted gene promoter may be any of themammalian imprinted gene promoters described herein. In someembodiments, the reporter molecule may be any of the reporter moleculesdescribed herein. In some embodiments the vector is a transposon vector,plasmid, retroviral vector, lentiviral vector, or adeno-associated viralvector.

In some aspects, described herein is a kit comprising: (a) one or morenucleic acids comprising (i) a mammalian imprinted gene promoter; and(ii) a sequence that encodes a reporter molecule, wherein the promoteris operably linked to the sequence that encodes the reporter moleculeand, optionally, one or more of the following: (b) a DNAmethyltransferase; (c) a transfection reagent; (d) a buffer solution;and (e) instructions for use of the kit.

In some aspects, described herein is a cell comprising a nucleic acidcomprising (i) a mammalian imprinted gene promoter and (ii) a sequencethat encodes a reporter molecule, wherein the promoter is operablylinked to the sequence that encodes the reporter molecule, and whereinthe nucleic acid is integrated into the genome of the cell. In someembodiments the mammalian imprinted gene promoter comprises at least aportion of a parent-of-origin differentially methylated region (DMR). Insome embodiments the imprinted gene is imprinted both in mice andhumans. In some embodiments the imprinted gene is imprinted in aspecies-specific manner, e.g., in mice but not in humans, or in humansbut not in mice. In some embodiments the imprinted gene promoter is apromoter of a gene selected from the group consisting of: Snrpn, Igf2r,Gnas, Igf2, Meg3 (Gtl2), Airn, Kenq1ot1, Mest, Grb10, and Peg10. In someembodiments the imprinted gene promoter is from the Snrpn gene. In someembodiments the sequence of the promoter comprises SEQ ID NO: 1 or SEQID NO: 2.

In some embodiments the reporter molecule may be any of the reportermolecules described herein. In some embodiments the reporter molecule isdetectable in individual cells. In some embodiments the reportermolecule comprises a fluorescent protein or a luciferase. In someembodiments the reporter molecule comprises a site-specific recombinase,e.g., Cre.

In some embodiments the nucleic acid is integrated into the genome ofthe cell in proximity to an enhancer, superenhancer, promoter, genebody, CpG island, CpG island shore, or low CpG density region. In someembodiments the region is a distal regulatory region. In someembodiments the nucleic acid is integrated into the genome at a locationno more than 10 kB away from a transcriptional start site. In someembodiments the nucleic acid is integrated into the genome at a locationmore than 10 kB away from a transcriptional start site.

In some embodiments the cell is a mammalian cell, e.g., a human or mousecell. In some embodiments the cell is a somatic cell. In someembodiments the cell is a pluripotent stem cell. In some embodiments thecell is a germ cell, stem cell, or zygote. In some embodiments the cellis a primary cell. In some embodiments the cell is a diseased cell. Insome embodiments the cell is a cancer cell. In some embodiments the cellis a white blood cell or fibroblast. In some embodiments the cell is acell that has been isolated from an embryo. In some embodiments the cellis a cell that has been isolated from a subject suffering from adisorder associated with aberrant DNA methylation. In some embodiments,the genomic DNA of the cell comprises at least one region that hasaberrant DNA methylation.

In some embodiments, the reporter molecule comprises a site-specificrecombinase, and the genome of the cell further comprises recombinationsites for the recombinase flanking a DNA segment whose excision orinversion results in a detectable change in the cell. In someembodiments the genome of the cell comprises a sequence encoding asecond reporter molecule, wherein excision or inversion of the DNAsegment results in turning expression of the second reporter molecule onor off. In some embodiments the second reporter molecule comprises afluorescent protein or a luciferase. In some embodiments the genome ofthe cell further comprises a nucleic acid comprising a cell state orcell type specific promoter operably linked to a sequence that encodesan additional reporter molecule. In some embodiments the additionalreporter molecule comprises a fluorescent protein or a luciferase. Insome embodiments the cell state or cell type specific promoter is anendogenous promoter. In some embodiments the sequence that encodes theadditional reporter molecule is integrated into the genome of the cellsuch that its transcription is under control of the endogenous promoter.

In some aspects, described herein is non-human mammal comprising atleast one cell that comprises a nucleic acid comprising (i) a mammalianimprinted gene promoter and (ii) a sequence that encodes a reportermolecule, wherein the promoter is operably linked to the sequence thatencodes the reporter molecule, and wherein the nucleic acid isintegrated into the genome of the cell. In some embodiments themammalian imprinted gene promoter comprises at least a portion of aparent-of-origin differentially methylated region (DMR). In someembodiments the imprinted gene is a gene that is imprinted both in miceand humans. In some embodiments the imprinted gene promoter is from agene selected from the group consisting of: Snrpn, Igf2r, Gnas, Igf2,Meg3 (Gtl2), Airn, Kenq1ot1, Mest, Grb10, and Peg10. In some embodimentsthe promoter is from the Snrpn gene. In some embodiments the sequence ofthe promoter comprises SEQ ID NO: 1 or SEQ ID NO: 2.

In some embodiments the reporter molecule may be any reporter moleculedescribed herein. In some embodiments the reporter molecule comprises afluorescent protein or a luciferase. In some embodiments the reportermolecule is detectable in vivo. In some embodiments the reportermolecule comprises a site-specific recombinase, e.g., Cre. In someembodiments the nucleic acid is integrated into the genome of the cellin proximity to an enhancer, superenhancer, promoter, gene body, CpGisland, CpG shore, or low density CpG region. In some embodiments thenucleic acid is integrated into the genome at a location no more than 10kB away from a transcriptional start site. In some embodiments thenon-human mammal is a non-human primate or rodent. In some embodimentsthe non-human mammal is a mouse.

In some embodiments all or substantially all somatic cells of thenon-human mammal have the nucleic acid or polypeptide integrated intotheir genome. In some embodiments the reporter molecule comprises asite-specific recombinase and the genome of the at least one cellfurther comprises recombination sites for the recombinase flanking aregion that encodes a second reporter molecule. In some embodiments thesecond reporter molecule comprises a fluorescent protein or aluciferase. In some embodiments the genome of the at least one cellfurther comprises a cell state or cell type specific promoter operablylinked to a region that encodes an additional reporter molecule. In someembodiments the additional reporter molecule comprises a fluorescentprotein or a luciferase. In some embodiments the nucleic acid isintegrated into a superenhancer, enhancer, promoter, gene body, CpGisland, CpG shore, or low density CpG region. In some embodiments thenucleic acid is integrated into the genome at a location no more than 10kB away from a transcriptional start site of a gene.

In some embodiments the nucleic acid is integrated into the genome ofthe at least one cell in proximity to a region that has aberrant DNAmethylation in subjects suffering from a disorder associated withaberrant DNA methylation. In some embodiments the animal has a mutationassociated with a disorder associated with aberrant DNA methylation. Insome embodiments the animal has a mutation in at least one gene thatencodes a DNA modifying enzyme. In some embodiments the mammal serves asa model for a human disorder associated with aberrant DNA methylation.

In some aspects, described herein is a method of generating anengineered mammalian cell comprising: (a) providing a mammalian cell;(b) introducing a nucleic acid or vector that comprises (i) a mammalianimprinted gene promoter; and (ii) a sequence that encodes a reportermolecule operably linked to the promoter into the cell; and (c)maintaining the cell for a time sufficient for the nucleic acid orvector to be integrated into the genome of the cell or a descendant ofthe cell, thereby generating an engineered mammalian cell. In someembodiments the method comprises contacting the cell with a targetablenuclease that cuts DNA in the genome of the cell at a location inproximity to a region of interest. In some embodiments contacting thecell with a targetable nuclease comprises expressing the targetablenuclease in the cell. In some embodiments the targetable nucleasecomprises a Cas9 protein, and the method comprises contacting the cellwith a guide RNA that targets the nuclease to a location in proximity tothe region of interest (e.g., within the region of interest). The regionof interest may be any region of interest described herein. The reportermolecule may be any reporter molecule described herein. The imprintedgene promoter may be any imprinted gene promoter described herein.

In some aspects, described herein is a method of detecting themethylation state of a DNA region of interest in the genome of a cellcomprising: (a) providing one or more cells comprising (i) a mammalianimprinted gene promoter and (ii) a sequence that encodes a reportermolecule, wherein the sequence that encodes a reporter molecule isoperably linked to the promoter and the nucleic acid is integrated inproximity to a region of interest in the genome of the cell; and (b)measuring expression of the reporter molecule by the one or more cells,wherein the level of expression of the reporter molecule is indicativeof the level of methylation of the region of interest, thereby detectingthe methylation state of the region of interest. In some embodimentsexpression of the reporter molecule is indicative of hypomethylation ofthe DNA region of interest and lack of expression of the reportermolecule is indicative of hypermethylation of the DNA region ofinterest. In some embodiments, measuring expression of the reportermolecule comprises measuring fluorescence or bioluminescence. In someembodiments measuring expression of the reporter molecule comprisesperforming fluorescence or bioluminescence imaging. In some embodimentsmeasuring expression of the reporter molecule comprises performingfluorescence activated cell sorting (FACS).

In some embodiments the reporter molecule comprises a site-specificrecombinase, and the genome of the cell further comprises recombinationsites for the recombinase flanking a DNA segment whose excision orinversion results in turning expression of the second reporter moleculeon or off, and measuring expression of the reporter molecule comprisesmeasuring the second reporter molecule. In some embodiments the secondreporter molecule comprises a fluorescent protein or a luciferase. Insome embodiments the genome of the cell further comprises a cell type orcell state specific promoter operably linked to a nucleic acid sequencethat encodes an additional reporter molecule. In some embodiments thepromoter is an endogenous promoter of a cell type or cell state specificgene. In some embodiments the method further comprises measuringexpression of the additional reporter molecule.

In some embodiments the method of detecting the methylation state of aDNA region of interest comprises exposing the cell to an agent orcondition of interest; measuring expression of the reporter moleculeencoded by a sequence that is operably linked to a mammalian imprintedgene promoter; and comparing the level of expression of the reportermolecule with a control value, wherein a difference between themeasurement and the control value indicates that the agent or conditionaffects methylation of the region of interest. In some embodiments theagent is a small molecule. In some embodiments the method comprisesplacing the cell under conditions in which it undergoes a change in cellstate; and measuring expression of the reporter molecule at one or moretime points during the change in cell state, one or more time pointsafter the change in cell state, or both. In some embodiments the changein cell state comprises a change to a more differentiated state or to aless differentiated state. In some embodiments the change in cell statecomprises a change from a somatic cell to an induced pluripotent stemcell. In some embodiments the method further comprises measuringexpression of one or more markers of cell identity or cell state by theone or more cells. In some embodiments expression of the reportermolecule is measured in an individual cell and its descendants. In someembodiments the cell is in a subject.

In some aspects, described herein is a method of monitoring themethylation state of a DNA region of interest in a cell over a period oftime comprising steps of: (a) providing one or more cells comprising (i)a mammalian imprinted gene promoter and (ii) a sequence that encodes areporter molecule, wherein the sequence that encodes the reportermolecule is operably linked to the promoter and the nucleic acid isintegrated in proximity to a region of interest in the genome of thecell; and (b) measuring expression of the reporter molecule by the oneor more cells at two or more time points, wherein the level ofexpression of the reporter molecule is indicative of the level ofmethylation of the region of interest, thereby monitoring themethylation state of the region of interest over a period of time. Insome embodiments, a decrease in expression of the reporter moleculebetween first and second time points is indicative of an increase in thelevel of methylation of the region of interest, and a decrease inexpression of the reporter molecule between first and second time pointsis indicative of an increase in the level of methylation of the regionof interest. In some embodiments at least two of the time points are atleast 12 hours apart. In some embodiments at least two of the timepoints are at least 7 days apart. In some embodiments the methodcomprises comparing the methylation state of the region of interest at afirst time point with the methylation state of the region of interest ata second time point, thereby determining whether methylation of theregion of interest increased or decreased between the first and secondtime points. In some embodiments measuring expression of the reportermolecule comprises measuring fluorescence or bioluminescence. In someembodiments measuring expression of the reporter molecule comprisesperforming fluorescence or bioluminescence imaging. In some embodimentsmeasuring expression of the reporter molecule comprises performingfluorescence activated cell sorting (FACS). In some embodiments thereporter molecule comprises a site-specific recombinase, and the genomeof the cell further comprises recombination sites for the recombinaseflanking a DNA segment whose excision or inversion results in turningexpression of the second reporter molecule on or off, and measuringexpression of the reporter molecule comprises measuring the secondreporter molecule. In some embodiments the second reporter moleculecomprises a fluorescent protein or a luciferase. In some embodiments thegenome of the cell further comprises a cell type or cell state specificpromoter operably linked to a nucleic acid sequence that encodes anadditional reporter molecule. In some embodiments the cell type or cellstate specific promoter is an endogenous promoter of a cell type or cellstate specific gene.

In some embodiments the method of monitoring the methylation state of aDNA region of interest comprises: exposing the cell to an agent orcondition of interest; measuring expression of the reporter molecule attwo or more time points; comparing the level of expression of thereporter molecule between two or more of the time points, wherein adifference between at least two of the time points indicates that theagent or condition affects methylation of the region of interest. Insome embodiments the agent is a small molecule. In some embodiments themethod comprises placing the cell under conditions in which it undergoesa change in cell state; and measuring expression of the reportermolecule at one or more time points during the change in cell state, oneor more time points after the change in cell state, or both. In someembodiments the change in cell state comprises a change to a moredifferentiated state or to a less differentiated state. In someembodiments the change in cell state comprises a change from a somaticcell to an induced pluripotent stem cell. In some embodiments the methodfurther comprises measuring expression of one or more endogenous genesor one or more additional reporter molecules by the one or more cells.In some embodiments the one or more endogenous genes or reportermolecules is a marker of cell identity or cell state. In someembodiments expression of the reporter molecule is measured in anindividual cell and in one or more descendants of the cell.

In some aspects, described herein is a method of evaluating the effectof an agent on the methylation state of a DNA region of interest in acell comprising steps of: contacting one or more cells comprising (i) amammalian imprinted gene promoter; and (ii) a sequence that encodes areporter molecule, wherein the sequence that encodes the reportermolecule is operably linked to the promoter and the nucleic acid isintegrated in proximity to a region of interest in the genome of thecell, with a test agent; measuring expression of the reporter molecule;and comparing the level of expression of the reporter molecule with acontrol value, wherein a difference between the measured value and thecontrol value indicates that the test agent modulates the methylationstate of the region of interest. In some embodiments the test agent is asmall molecule. In some embodiments the test agent is a protein or anucleic acid. In some embodiments the method comprises detecting anincrease in the level of expression of the reporter molecule as comparedto the control value, thereby determining that the agent decreasesmethylation of the region of interest. In some embodiments the methodcomprises detecting a decrease in the level of expression of thereporter molecule as compared to the control value, thereby determiningthat the agent increases DNA methylation of the region of interest. Insome embodiments the region of interest has aberrant DNA methylation incells affected by a disorder. In some embodiments the region of interestis aberrantly hypermethylated in a disorder of interest, and the methodcomprises detecting an increase in the level of expression of thereporter molecule as compared to the control value, thereby determiningthat the agent decreases DNA methylation of the region of interest. Insome embodiments the region of interest is aberrantly hypomethylated ina disorder of interest, and the method comprises detecting a decrease inthe level of expression of the reporter molecule as compared to thecontrol value, thereby determining that the agent increases DNAmethylation of the region of interest.

In some aspects, described herein is a method of identifying an agentthat modulates the methylation state of a DNA region of interest in acell comprising steps of: contacting one or more cells of any of claims24-54 with a test agent; measuring expression of the reporter molecule;comparing the level of expression of the reporter molecule with acontrol value; and detecting a difference between the measurement andthe control value, thereby identifying the test agent as an agent thatmodulates the methylation state of a DNA region of interest in a cell.In some embodiments the test agent is a small molecule, a protein, or anucleic acid. In some embodiments the method comprises detecting anincrease in the level of expression of the reporter molecule as comparedto the control value, thereby identifying the test agent as an agentthat decreases the level of methylation of the region of interest. Insome embodiments the method comprises detecting a decrease in the levelof expression of the reporter molecule as compared to the control value,thereby identifying the test agent as an agent that increases the levelof methylation of the region of interest. In some embodiments the regionof interest has aberrant DNA methylation in cells affected by adisorder. For example in some embodiments the region of interest hasaberrantly high DNA methylation in cells affected by a disorder. In someembodiments the region of interest has aberrantly low DNA methylation(e.g., aberrant loss of DNA methylation) in cells affected by adisorder. In some embodiments the method further comprises administeringa test agent that modulates methylation of the region of interest to amammalian subject. In some embodiments the mammalian subject serves asan animal model for a disorder associated with aberrant DNA methylation.

BRIEF DESCRIPTION OF THE DRAWINGS

The patent or application file contains at least one drawing executed incolor. Copies of this patent or patent application publication withcolor drawings will be provided by the Office upon request and paymentof the necessary fee.

FIGS. 1A-1G illustrate that an active minimal Snrpn promoter can berepressed in cis by means of spreading of DNA methylation into thepromoter region. (FIG. 1A) Schematic representation of thesleeping-beauty based vectors. Endogenous CpG Islands (CGI) of Dazl andGapdh genes were cloned upstream of a minimal Snrpn promoterregion−driving GFP. Lollipops schematically represent individual CpG.(FIG. 1B) Flow cytometric analysis of V6.5 mESCs grown for 4 weeks inserum+LIF, following stable integration of unmethylated Gapdh and Dazlreporter vectors, demonstrating robust repression of GFP signal in theDazl reporter cells. Shown are the mean percentages of GFP negativecells ±STD of two biological replicates. (FIG. 1C) Flow cytometricanalysis of the proportion of GFP positive cells of Gapdh-GFP-positivesorted cells (left panel) and Dazl-GFP-negative sorted cells (rightpanel), following 7 days in culture. (FIG. 1D and FIG. 1E) Phase andfluorescence images of the sorted V6.5 mESCs, comprising stableintegration of the Gapdh (left) and Dazl (right) vectors (FIG. 1D), andfollowing prolonged culturing for 7 weeks (FIG. 1E). (FIG. 1F and FIG.1G) Bisulfite sequencing analysis of the stably transfected Gapdh (FIG.1F) and Dazl (FIG. 1G) reporter cell lines was performed on the genepromoter-associated CGI (left) and the downstream Snrpn promoter region(right). Open circles represents unmethylated CpGs; Filledcircles—methylated CpGs.

FIGS. 2A-2I demonstrate that an in vitro repressed Snrpn promoter can bereactivated in cis by means of spreading of DNA demethylation into thepromoter region. (FIG. 2A and FIG. 2B) Bisulfite sequencing analysis ofthe in vitro methylated Gapdh (FIG. 2A) and Dazl (FIG. 2B) vectors wasperformed on the gene promoter-associated CGI (left panels) and thedownstream Snrpn promoter region (right panels). (FIG. 2C) Phase andfluorescence images of the stably integrated V6.5 mESCs, harboring Gapdh(left) and Dazl (right) in vitro methylated vectors, following one weekof antibiotics selection. (FIG. 2D) Flow cytometric analysis of theproportion of GFP positive cells in V6.5 mESCs, stably integrated witheither Gapdh (left panel) or Dazl (right panel) in vitro methylatedvectors, following 2 weeks in culture. (FIG. 2E and FIG. 2F) Bisulfitesequencing analysis of the stably transfected Gapdh (FIG. 2E) and Dazl(FIG. 2F) reporter cell lines, was performed on the genepromoter-associated CGI left panels) and the downstream Snrpn promoterregion (right panels). (FIG. 2G) Flow cytometric analysis of theproportion of GFP positive cells in V6.5 mESCs (upper panel), and Dnmt1KO mESCs (lower panel), stably integrated with in vitro methylated Dazlreporter vector. (FIG. 2H) Flow cytometric analysis of the proportion ofGFP negative cells in mESCs deficient for both Dnmt3a and Dnmt3b(Dnmt3ab KO), which were stably integrated with unmethylated Gapdh(upper panel) and Dazl (lower panel) reporter vectors. (FIG. 2I) Flowcytometric analysis of the proportion of GFP negative V6.5 mESCs grownin 2i+LIF, following stable integration of Gapdh (upper panel) and Dazl(lower panel) unmethylated reporter vectors.

FIGS. 3A-3D illustrate that generation of DNA methylation reporter celllines for the pluripotent-specific miR290 and Sox2 SE regions. (FIG. 3A)Regional view depicting the DNA methylation (upper panel) and chromatin(lower panel) landscape of miR290 upstream pluripotent-specific SE.Shown are average methylation levels and enrichment of chromatin marksin mouse undifferentiated cells (green) and in adult tissues (gold), inrespect to the genomic organization of the genes. DNA methylation variesfrom 1—hypermethylated to 0—hypomethylated; Characteristic clusters oftypical enhancer marks and binding of tissue-specific TF determine theSE region (light blue). (FIG. 3B) CRISPR/Cas-based strategy used tointegrate the DNA methylation reporter into the endogenous SE region.Green sequence—endogenous miR290 CpG region; Black sequence—targetingvector; Red sequence PAM recognition site. (FIG. 3C) Phase andfluorescence images of correctly integrated DNA methylation reportercell lines for miR290 (upper panel) and Sox2 (lower panel) endogenous SEregions. GFP marks endogenous expression levels of Nanog, whereastdTomato reflects the endogenous DNA methylation levels at both miR290and Sox2 SE regions. (FIG. 3D) Bisulfite sequencing analysis wasperformed on undifferentiated mESCs harboring the DNA methylationreporter in either miR290 SE region (upper panel) or Sox2 SE region(lower panel). For each cell line, the PCR amplicon (marked with dashedline) includes both the endogenous CGI (left) and the downstreamintegrated Snrpn promoter region (right).

FIGS. 4A-4E show the dynamics of de novo DNA methylation of miR290 andSox2 SE regions upon in vitro differentiation. (FIG. 4A) Schematicrepresentation of the RA-based differentiation protocol used on miR290and Sox2 reporter cell lines. GFP marks endogenous expression levels ofNanog, whereas tdTomato reflects the endogenous DNA methylation levelsat both miR290 and Sox2 SE regions. (FIG. 4B) Flow cytometric analysisof the proportion of GFP positive cells (X axis) and tdTomato positivecells (Y axis) during 7 days of differentiation of miR290 #21 (upperpanel) and Sox2 #2 (lower panel) reporter cell lines. (FIG. 4C) Bargraph summarizing the proportion of the different cell populationsduring the course of 7 days RA differentiation for both miR290 #21(upper panel) and Sox2 #2 (lower panel) reporter cell lines. Datarepresents two biological replicates. R—tdTomato; G—GFP. (FIG. 4D andFIG. 4E) Bisulfite sequencing analysis on the three main cellpopulations—sorted at 48 hours following initial treatment with RA. Forboth miR290 #21 (FIG. 4D) and Sox2 #2 (FIG. 4E) cell lines, the PCRamplicon (marked with dashed line) includes the endogenous CGI (left)and the downstream integrated Snrpn promoter region (right). R—tdTomato;G—GFP.

FIGS. 5A-5G show the dynamics of DNA demethylation of miR290 and Sox2 SEregions during cellular reprogramming. (FIG. 5A) miR290 (upper panel)and Sox2 (lower panel) reporter chimeric embryos (Experiment embryos).For controls, Gapdh CGI reporter mESCs driving GFP and constitutivelyexpressing tdTomato (Control Gapdh-GFP and tdTomato, respectively), wereinjected into a host blastocyst. Both miR290 and Sox2 embryos werecompared to the same control embryo (left embryo in each panel). (FIG.5B) Schematic representation of the experimental procedure to monitorthe dynamics of demethylation during reprogramming of miR290 and Sox2reporter cell lines. GFP marks endogenous expression levels of Nanog,whereas tdTomato reflects the endogenous DNA methylation levels at bothmiR290 and Sox2 SE regions. (FIG. 5C) Flow cytometric analysis of theproportion of GFP positive cells (X axis) and tdTomato positive cells (Yaxis) in PO MEFs derived from miR290 #21 (left) and Sox2 #2 (right)chimeric embryos. (FIG. 5D) Bisulfite sequencing analysis was performedon PO MEFs derived from miR290 #21 (upper panel) and Sox2 #2 (lowerpanel) chimeras. For each cell line, the PCR amplicon (marked withdashed line) includes both the endogenous CGI (left) and the downstreamintegrated Snrpn promoter region (right). (FIG. 5E) Analysis of theproportion of GFP positive cells (X axis) and tdTomato positive cells (Yaxis) during the course of reprogramming of MEFs derived from miR290 #21(upper panel) and Sox2 #2 (lower panel) chimeras. Shown are flowcytometric data from different time points following addition of doxsupplemented with 3C culture condition. (FIG. 5F) Representative imagesof established miR290 and Sox2 iPSC lines, derived from sorted doublepositive (tdTomato⁺/GFP⁺) colonies. (FIG. 5G) Bisulfite sequencinganalysis was performed on P2 iPSCs derived from miR290 #21 (upper panel)and Sox2 #2 (lower panel) MEFs. For each cell line, the PCR amplicon(marked with dashed line) includes both the endogenous CGI (left) andthe downstream integrated Snrpn promoter region (right).

FIGS. 6A-6B illustrate that a minimal Snrpn promoter can be utilized toreport on real time changes in DNA methylation (FIG. 6A) Shown areaverage methylation levels in different mouse cell types, in respect tothe Snrpn promoter region. DNA methylation varies from 1—hypermethylatedto 0—hypomethylated; the imprinted DMR is marked by light blue. Note theintermediate methylation levels, representing a typical monoallelicmethylation at imprinted DMR regions. (FIG. 6B) Flow cytometric analysisof V6.5 mESCs, following stable integration of unmethylated Gapdh andDazl reporter vectors. Shown are the mean percentages of GFP negativecells ±STD of three biological replicates.

FIGS. 7A-7C illustrate the integration of DNA methylation reporter intopluripotent-specific SE regions. (FIG. 7A) Regional view depicting theDNA methylation (upper panel) and chromatin (lower panel) landscape ofSox2 upstream pluripotent-specific SE. Shown are average methylationlevels and enrichment of chromatin marks in mouse undifferentiated cells(green) and in adult tissues (gold), in respect to the genomicorganization of the genes.DNA methylation varies from 1—hypermethylatedto 0—hypomethylated; Characteristic clusters of typical enhancer marksand binding of tissue-specific TF determine the SE region (light blue).(FIG. 7B) CRISPR/Cas-based strategy used to integrate the DNAmethylation reporter into the SE region. Green sequence—endogenous Sox2CpG region; Black sequence—targeting vector; Red sequence PAMrecognition site. (FIG. 7C) Southern blot analysis (upper panels) andPCR (lower panels), were used to identify single and correct integrationof GLINER into the endogenous miR290 (left) and Sox2 (right) SE region.Restriction enzymes used to detect the tdTomato-based probe, aredesignated above.

FIGS. 8A-8B illustrate reprogramming of MEFs isolated from miR290 andSox2 reporter cell lines. (FIG. 8A) Representative phase andfluorescence images of established MEFs derived from miR290 #21 (left)and Sox2 #2 (right) mESC lines, demonstrating complete repression ofboth tdTomato and GFP signals. (FIG. 8B) Analysis of the proportion ofGFP positive cells (X axis) and tdTomato positive cells (Y axis) of Sox2iPSCs following a split at day 28 of reprogramming. Shown are twoconsecutive passages, demonstrating a shift in the single GFP positivepopulation towards a double positive cell population.

FIG. 9 depicts the sequence from Snrpn promoter region and minimal Srnpnpromoter.

FIG. 10 depicts the sequence of Igftr promoter-associated differentiallymethylated region.

FIG. 11 depicts the sequence of Gnas promoter-associated differentiallymethylated region.

FIG. 12 depicts the sequence of Meg3 promoter-associated differentiallymethylated region

DETAILED DESCRIPTION OF CERTAIN EMBODIMENTS Glossary

Certain terms used in the present application, and related information,are collected here for convenience. General or specific features of thedescription of terms in this glossary may be applied in or to anyaspect, embodiment, context, description, or claim in which such term isused.

The term “aberrant DNA methylation” is used to indicate that the overalllevel of DNA methylation in the genome of one or more cells of interestand/or the DNA methylation level of one or more regions of DNA in thegenome of one or more cells of interest is detectably different from acontrol level that is typical of that found in normal cells. If thelevel of methylation differs detectably from the control level (ishigher than the control level or lower than the normal level) in eitheror both strands of a region of genomic DNA, the region is considered tobe aberrantly methylated. The control level used for particular cell(s)of interest may be obtained from control cells maintained under the sameor comparable conditions as the cells of interest (so long as thoseconditions are not known to significantly affect DNA methylation) orunder standard conditions, which refers to typical culture conditionsfor cells of a given type or conditions in a normal, healthy subject orin a typical biological sample obtained from a normal, healthy subject.The control level of methylation for a particular DNA region istypically the level of methylation that such region normally exhibitswhen present in normal cells in its natural location. Normal cells fromwhich a control level is obtained are typically of the same species ascells of interest for which they serve as a control. Control cells maybe of the same cell type, developmental stage, and/or differentiationstate as cells for which they serve as a control. For example, if a DNAregion is known or suspected to be methylated in a cell or tissuespecific manner, cells of the same type may be used as control cells; ifmethylation of a DNA region is known or suspected to be developmentallyregulated, cells of the same developmental stage may be used as controlcell. In some embodiments, the cell(s) of interest are obtained from asubject suffering from a disorder. Normal cells could be cells obtainedfrom a subject not suffering from a disorder, e.g., a healthy subject.In some embodiments, normal cells are cells in the same tissue or organas cells affected by a disorder, but located outside the area affectedby the disorder. A control level may be measured using the same or acomparable assay as that used to obtain a value with which the controlvalue is compared. Historical controls (e.g., values reported in thescientific literature or in databases or online resources such as theUCSC Genome Browser or GENCODE (available on the worldwide web atsubdomain gencodegenes.org; ENCODE Project Consortium. Nature. 2012;489(7414):57-74) may be used.

The term “biological sample” or “sample” refers to any biologicalspecimen. In general, a biological sample of interest herein comprisesone or more cells, tissue, or cellular material (e.g., material derivedfrom cells, such as a cell lysate or fraction thereof). A biologicalsample may be obtained from (i.e., originates from, was initiallyremoved from) a subject. In some embodiments a biological samplecontains at least some intact cells. In some embodiments a biologicalsample retains at least some of the microarchitecture of a tissue fromwhich it was removed. A biological sample may be subjected to one ormore processing steps after having been obtained from a subject and/ormay be split into one or more portions. The term “biological sample”encompasses processed samples, portions of samples, etc., and suchsamples are considered to have been obtained from the subject from whomthe initial sample was removed. In some embodiments a sample may beobtained from an individual who has been diagnosed with or is suspectedof having a mitochondrial disorder. A sample, e.g., a sample used in amethod or composition disclosed herein, may have been procured directlyfrom a subject, or indirectly, e.g., by receiving the sample from one ormore persons who procured the sample directly from the subject, e.g., bya procedure on the subject.

The term “DNA region of interest” (also referred to as a “region ofinterest”) refers to any DNA region selected by the artisan, e.g., foruse in a product described herein or for use in or analysis according tomethods described herein. A DNA region may be part of a larger piece ofDNA or may be a separate piece of DNA with free 5′ and 3′ termini. Insome embodiments a DNA region of interest is a stretch of DNA within achromosome. In some embodiments a DNA region of interest is a segment ofgenomic DNA that is naturally present in the genome of a cell in itsnormal location. In some embodiments a DNA region of interest is a DNAsegment that has been inserted into the genome of a cell by the hand ofman. The DNA region of interest may be one that occurs naturally in thegenome but at a different location from the location at which it isinserted. The DNA region of interest may be one for which the nucleotidesequence is contained in a publically available database or otherpublically available resource. The DNA region of interest may also be anaturally occurring variation of a reference nucleotide sequence (e.g.,a sequence contained in a publically available database), including, forexample, polymorphic variations of the sequence. A DNA region ofinterest may comprise a DNA element such as a promoter, enhancer, CpGisland, gene body, or a portion thereof. For example, the DNA region ofinterest may comprise a promoter in an RGM construct (e.g., polymorphicvariants of a mammalian imprinted gene promoter such as might possiblyexist in different individuals or, where relevant, different strains orsubstrains). One of ordinary skill in the art appreciates, for example,that genome sequences from a variety of different mouse strains andsubstrains are available and that sequences from any such strain orsubstrain (or individual) could be used in various embodiments, and thatone could obtain nucleic acids comprising a mammalian imprinted genepromoter or portion thereof, or other sequences such as those of a DNAregion of interest by, for example, amplification using appropriateprimers, regardless of whether the genome of such individual, strain, orsubstrain has been sequenced. It is notable that there are a largenumber of publicly available sequenced mouse genomes (see, for example,worldwide web at subdomain sanger.ac.uk/resources/mouse/genomes/). Inaddition, one of ordinary skill in the art appreciates that manyidentified polymorphisms and other genetic variants can be found in theNCBI's Single Nucleotide Polymorphism database (dbSNP), for humans andvarious other species.

In certain embodiments the length of a region of DNA is between about100 base pairs (bp) and about 500 bp, between about 500 bp and about1000 bp (1 kb), between about 1 kb and about 2 kb, between about 2 kband about 3 kb, between about 3 kb and about 4 kb, between about 4 kband about 5 kb, between about 5 kb and about 10 kb, between about 10 kband about 20 kb, or between about 20 kb and about 50 kb. In someembodiments a DNA region of interest comprises between about 10 andabout 25 CpGs, between about 25 and about 50 CpGs, between about 50 andabout 100 CpGs, between about 100 and about 250 CpGs, between about 250and about 500 CpGs, between about 500 and about 1000 CpGs, or more.

The term “hypermethylation” refers to a higher level of methylation thanthe average level of methylation in the mammalian genome. A DNA regionis considered hypermethylated if at least 80% of the CpG dinucleotidesin the region are methylated. In some embodiments, at least 85%, atleast 90%, at least 95%, at least 98%, at least 99%, or more (e.g.,100%) of the CpGs in the region are methylated. Where indicated orevident from the context, the term “hypermethylation” refers to anaberrantly high level of methylation as compared with a control level oran increased level of methylation as compared with a particular levelwith which it is compared. For example, if a particular region ofgenomic DNA has a level of methylation of 70% in cancer cells andnormally has a level of methylation of 10% in normal cells, the DNAregion is considered to be hypermethylated in cancer cells.

The term “hypomethylation” refers to a lower level of methylation thanthe average level of methylation in the mammalian genome. A DNA regionis considered hypomethylated if no more than 50% of the CpGdinucleotides in the region are methylated. In some embodiments, no morethan 40%, no more than 30%, no more than 20%, no more than 10%, no morethan 5%, no more than 2%, or no more than 1% of the CpGs in the regionare methylated. Where indicated or evident from the context, the term“hypomethylation” refers to an aberrantly low level of methylation ascompared with a control level. For example, if a particular region ofgenomic DNA has a level of methylation of 10% in cancer cells andnormally has a level of methylation of 70% in normal cells, the DNAregion is considered to be hypomethylated in cancer cells.

“Imprinting” refers to the differential expression of alleles of thesame gene in a parent-of-origin-specific manner, or to the biologicalprocess by which such a pattern is established. An “imprinted gene” is agene that is subject to imprinting. Mammalian somatic cells are normallydiploid, i.e., they contain two homologous sets of autosomes(chromosomes that are not sex chromosomes)—one set inherited from eachparent, and a pair of sex chromosomes. Thus, mammalian somatic cellsnormally contain two copies of each autosomal gene—a maternal copy and apaternal copy. The two copies (often referred to as “alleles”) may beidentical or may differ at one or more nucleotide positions. For mostgenes, the alleles inherited from the mother and father exhibit similarexpression levels. In contrast, imprinted genes are normally expressedin a parent-of-origin specific manner—either the maternal allele (theallele on the chromosome inherited from the mother) is expressed and thepaternal allele (the allele present on the chromosome inherited from thefather) is not, or the paternal allele is expressed and the maternalallele is not. The allele that is not expressed may be referred to asthe “imprinted allele” or “imprinted copy”. Imprinted genes can occur inlarge, coordinately regulated clusters or small domains composed of onlyone or two genes. Imprinting has generally been found to be conservedbetween mice and humans, i.e., if a gene is imprinted in mice, theorthologous gene is typically imprinted in humans as well, and viceversa. Parental allele-specific expression of imprinted genes isgenerally due to an imprinting control region.

As used herein, an “imprinting control region” (ICR), also referred toas an “imprinting control center” is a DNA region that controls theimprinting of at least one gene (typically a cluster of genes). In otherwords, ICRs control the mono-allelic expression of the at least one genein a manner that depends on the parental origin of the alleles. An ICRmust be on the same chromosome as the imprinted gene(s) whose expressionit affects but can be located a considerable distance away (e.g., up toseveral megabases away). ICRs are differentially methylated and areexamples of differentially methylated regions (DMRs).

The term “isolated” means 1) separated from at least some of thecomponents with which it is usually associated in nature; 2) prepared orpurified by a process that involves the hand of man; and/or 3) notoccurring in nature, e.g., present in an artificial environment. In someembodiments an isolated nucleic acid is a nucleic acid that is not foundin nature and/or is outside a cell. In some embodiments an isolated cellis a cell that has been removed from a subject, generated in vitro,separated from at least some other cells in a cell population or sample,or that remains after at least some other cells in a cell population orsample have been removed or eliminated.

The term “level of methylation” refers to the proportion of cytosinenucleotide residues that are methylated within a given region of DNA,i.e., the total number of methylated cytosine residues in the region bythe total number of nucleotides in the region. DNA methylation inmammals occurs most frequently on cytosines in CpG dinucleotides, andthe level of methylation is often the same or about the same as thelevel of CpG methylation. Where the present disclosure refers to a levelof methylation, certain embodiments relate specifically to the level ofCpG methylation, i.e., the number of CpGs in the region that aremethylated on the cytosine residue divided by the total number of CpGsin the region.

“Modulate” as used herein means to decrease (e.g., inhibit, reduce,suppress) or increase (e.g., stimulate, activate, enhance) a level,response, property, activity, pathway, or process. A “modulator” is anagent capable of modulating a level, response, property, activity,pathway, or process. In some embodiments modulation may refer toinhibition by at least about 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%,50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99%.In some embodiments modulation may refer to an increase by at leastabout 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%,70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%. 100%, 200% (2-fold),5-fold, 10-fold, or more.

The terms “approximately” or “about” as used herein generally includenumbers that fall within a range of 20% or in some embodiments within arange of 10% of a number or in some embodiments within a range of 5% ofa number in either direction (greater than or less than the number)unless otherwise stated or otherwise evident from the context (exceptwhere such number would impermissibly exceed 100% of a possible value).Where the number is a nucleotide or amino acid position, “about”encompasses positions up to 5, 10, or 20 residues away. If thenucleotide or amino acid position defines an end of a nucleic acid oramino acid segment, “about” includes positions that fall within a rangeof 20% or in some embodiments within a range of 10% or in someembodiments within a range of 5% of the length of the nucleic acid oramino acid segment. For any embodiment in which a numerical value isprefaced by “about” or “approximately”, an embodiment is disclosed inwhich the exact value is recited. For any embodiment in which anumerical value is not prefaced by “about” or “approximately”, anembodiment in which the value is prefaced by “about” or “approximately”is disclosed.

The term “cell type specific gene” refers to a gene that is typicallyexpressed selectively in one or a small number of cells types relativeto its expression in many or most other cell types. A cell type specificgene is typically transcribed under direction of a cell type specificpromoter in those cells in which it is transcribed. One of skill in theart will be aware of numerous genes that are considered cell typespecific. “Cell type” is used interchangeably herein with “cellidentity”. A cell type specific gene need not be expressed only in asingle cell type but may be expressed in one or several, e.g., up toabout 5, or about 10 different cell types out of the approximately 200commonly recognized (e.g., in standard histology textbooks) and/or mostabundant cell types in an adult vertebrate, e.g., mammal, e.g., human.In some embodiments, a cell type specific gene is one whose expressionlevel can be used to distinguish a cell, e.g., a cell as disclosedherein, such as a cell of one of the following types from cells of theother cell types: adipocyte (e.g., white fat cell or brown fat cell),cardiac myocyte, chondrocyte, endothelial cell, epidermal cells,epithelial cells, exocrine gland cell, fibroblast, glial cell,hematopoietic cells, hepatocyte, hair follicle cells, keratinocyte,macrophage, melanocyte, monocyte, mononuclear cell, myeloid cell,neuron, neutrophil, osteoblast, osteoclast, pancreatic islet cell (e.g.,a beta cell), Sertoli cell, skeletal myocyte, smooth muscle cell, Bcell, plasma cell, T cell (e.g., regulatory, cytotoxic, helper), ordendritic cell. In some embodiments a cell type specific gene is lineagespecific, e.g., it is specific to a particular lineage (e.g.,hematopoietic, neural, muscle, etc.). In some embodiments a cell typespecific gene may be used to distinguish cells of a particular subtypewithin a more general type. For example, a cell type specific gene maybe specifically expressed in a particular subtype of neuron as comparedwith other subtypes of neuron. In some embodiments, a cell type specificgene is a gene that is more highly expressed in a given cell type thanin most (e.g., at least 80%, at least 90%) or all other cell types. Thusspecificity may relate to level of expression, e.g., a gene that iswidely expressed at low levels but is highly expressed in certain celltypes could be considered cell type specific to those cell types inwhich it is highly expressed. It will be understood that expression canbe normalized based on total mRNA expression (optionally including miRNAtranscripts, long non-coding RNA transcripts, and/or other RNAtranscripts) and/or based on expression of a housekeeping gene in acell. In some embodiments, a gene is considered cell type specific for aparticular cell type if it is expressed at levels at least 2, 5, or atleast 10-fold greater in that cell than it is, on average, in at least25%, at least 50%, at least 75%, at least 90% or more of the cell typesof an adult of that species, or in a representative set of cell types.One of skill in the art will be aware of databases containing expressiondata for various cell types, which may be used to select cell typespecific genes. In some embodiments a cell type specific gene is atranscription factor. The transcription factor may be one that isinvolved in establishing or maintaining the particular identity (celltype) of the cell (“master transcription factors”). In some embodimentsa cell type specific gene is one that encodes a protein or RNA thatplays a role in a biological process or function for which cells of agiven type are particularly adapted (i.e., it is the only cell type orone of only a few cell types that carry out that biological process orfunction). Cell type specific genes include, e.g., genes that encodecertain intermediate filament proteins (e.g., keratins), tubulins,integrins, enzymes involved in synthesis of specialized cell productssuch as neurotransmitters or hormones or growth factors, receptors forspecialized cell products, CD molecules. Cell type specific genes and/ortheir encoded gene products may be referred to as “markers” of cellidentity. One of ordinary skill in the art would appreciate that cellsof a given type may be identified by their level of expression (e.g.,“positive” or “negative”) of one or a combination of cell identitymarkers. Other characteristics such as morphology, light scatter, and/orlocation of the cell in the body, may be used alternately or incombination with marker expression levels.

The term “cell state specific gene” refers to a gene that is typicallyexpressed selectively in cells in a particular state relative to itsexpression in many or most cells that are not in that state. In someembodiments a cell state specific gene is one that encodes a protein orRNA that plays a role in establishing or maintaining the particular cellstate. For example, the gene may be characterized in that inhibiting itsexpression causes the cell to cease being in a particular state, e.g.,causes the cell to enter a different state or may be characterized inthat ectopically expressing the gene (sometimes in combination with oneor more other genes) causes a cell that is not in a particular state toassume that state. Cell state specific genes and/or their encoded geneproducts may be referred to as “markers” of cell state. One of ordinaryskill in the art would appreciate that cells of a given type may beidentified by their level of expression (e.g., “positive” or “negative”)of one or a combination of cell state markers. Other characteristicssuch as morphology, light scatter, and/or location of the cell in thebody, may be used alternately or in combination with marker expressionlevels.

The term “DNA methylation” refers to the covalent attachment of a methylgroup to DNA at the C5 position of a cytosine ring. In mammals, DNAmethylation typically occurs at a cytosine (C) that is followed, in the5′ to 3′direction, by a guanine (G). This dinucleotide is often referredto as a CpG. There are approximately 28 million CpGs in the diploidmammalian genome, of which roughly 60%-80% are methylated in somaticcells (Smith, Z. D., and Meissner, A. (2013)). Three enzymes, DNAmethyltransferase 1 (DNMT1), DNMT3A, and DNMT3B, are responsible for DNAmethylation and maintenance in mammals. DNA methylation is heritablethrough somatic cell divisions. DNMT1 has a preference forhemimethylated DNA (i.e., double-stranded DNA that is methylated on onlyone cytosine within CpGs located opposite one another in the twostrands) and is mainly responsible for maintaining genomic DNAmethylation patterns during DNA replication by methylating cytosines inthe newly synthesized strand, thereby converting hemimethylated CpGdinucleotides generated after replication to fully methylated CpG.DNMT3A and DNMT3B are mainly responsible for de novo DNA methylation,i.e., methylation at sites that are not hemimethylated. However, allthree enzymes may contribute to both maintenance and de novo DNAmethylation. DNMT3L is a catalytically inactive protein that interactswith these enzymes to stimulate DNA methylation. DNA can be demethylatedby active and passive processes. So-called passive demethylation occursthrough failure to methylate cytosines on the newly synthesized strandduring DNA replication, which can result from downregulation of DNMT1.Active demethylation refers to processes in which the methyl group isenzymatically processed and removed. Members of a family often-eleventranslocation (TET) proteins (e.g., Tet1, Tet2, Tet3) can catalyzestepwise oxidation of 5hmC to 5-formylcytosine (5fC) and5-carboxylcytosine (5caC). 5fC and 5caC can be recognized and excised bythymine DNA glycosylase (TDG) to generate an abasic site, which can berepaired to unmodified cytosine through the base excision repairpathway.

The term “disorder associated with aberrant DNA methylation” refers toany disorder in which aberrant DNA methylation is found more frequentlyin at least some cells in subjects who have the disorder than in cellsof healthy subjects. The term “disorder” encompasses any disorder,disease, syndrome, or other clinical condition. Examples of disordersassociated with aberrant DNA methylation include Alzheimer's disease,autism spectrum disorders, autoimmune disorders (e.g., rheumatoidarthritis, lupus), cancer, male infertility, psychiatric disorders(e.g., bipolar disorder, depression, schizophrenia), Rett syndrome, andFragile X syndrome. Those of ordinary skill in the art are familiar withthe clinical characteristics and methods for diagnosis of disorders ofinterest herein. Imprinting disorders are considered to be disordersassociated with aberrant DNA methylation.

The term “identity” or “percent identity” refers to a measure of theextent to which the sequence of two or more nucleic acids orpolypeptides is the same. The percent identity between a sequence ofinterest A and a second sequence B may be computed by aligning thesequences, allowing the introduction of gaps to maximize identity,determining the number of residues (nucleotides or amino acids) that areopposite an identical residue, dividing by the minimum of TG_(A) andTG_(B) (here TG_(A) and TG_(B) are the sum of the number of residues andinternal gap positions in sequences A and B in the alignment), andmultiplying by 100. When computing the number of identical residuesneeded to achieve a particular percent identity, fractions are to berounded to the nearest whole number. Sequences can be aligned with theuse of a variety of computer programs known in the art. For example,computer programs such as BLAST2, BLASTN, BLASTP, Gapped BLAST, etc.,may be used to generate alignments and/or to obtain a percent identity.The algorithm of Karlin and Altschul (Karlin and Altschul, Proc. Natl.Acad. Sci. USA 87:22264-2268, 1990) modified as in Karlin and Altschul,Proc. Natl. Acad Sci. USA 90:5873-5877, 1993 is incorporated into theNBLAST and XBLAST programs of Altschul et al. (Altschul, et al., J. Mol.Biol. 215:403-410, 1990). In some embodiments, to obtain gappedalignments for comparison purposes, Gapped BLAST is utilized asdescribed in Altschul et al. (Altschul, et al. Nucleic Acids Res. 25:3389-3402, 1997). When utilizing BLAST and Gapped BLAST programs, thedefault parameters of the respective programs may be used. See worldwideweb at subdomain ncbi.nlm.nih.gov and/or McGinnis, S. and Madden, T L,W20-W25 Nucleic Acids Research, 2004, Vol. 32, Web server issue. Othersuitable programs include CLUSTALW (Thompson J D, Higgins D G, Gibson TJ, Nuc Ac Res, 22:4673-4680, 1994) and GAP (GCG Version 9.1; whichimplements the Needleman & Wunsch, 1970 algorithm (Needleman S B, WunschC D, J Mol Biol, 48:443-453, 1970.) Percent identity may be evaluatedover a window of evaluation. In some embodiments a window of evaluationmay have a length of at least 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%,50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, ormore, e.g., 100%, of the length of the shortest of the sequences beingcompared. In some embodiments a window of evaluation is at least 100;200; 300; 400; 500; 600; 700; 800; 900; 1,000; 1,200; 1,500; 2,000;2,500; 3,000; 3,500; 4,000; 4,500; or 5,000 amino acids. In someembodiments no more than 20%, 10%, 5%, or 1% of positions in eithersequence or in both sequences over a window of evaluation are occupiedby a gap. In some embodiments no more than 20%, 10%, 5%, or 1% ofpositions in either sequence or in both sequences are occupied by a gap.

The term “imprinting disorder” refers to any disorder caused byalterations in the normal imprinting pattern, any disorder caused bychanges in expression or gene dosage of an imprinted gene, and/or anydisorder caused by the mutation or deletion of an imprinted gene.Non-limiting examples of imprinting disorders include Angelman syndrome,Prader-Willi syndrome, Beckwith-Wiedemann syndrome, Silver-Russellsyndrome, and certain forms of pseudohypoparathyroidism.

The term “integrated” when used to refer to refer to a nucleic acid(e.g., a DNA methylation reporter) being integrated into the genome of acell means that the nucleic acid is incorporated into the genome of thecell. It should be understood that use of the term “integrated” is notintended to imply any particular mechanism by which such incorporationoccurs. “Integration” encompasses processes by which exogenous DNA isdirectly incorporated into the genome as well as processes in whichexogenous nucleic acid is used as a template for homology-directedrepair of a break in genomic DNA resulting in some sequences from theexogenous DNA being introduced into the genome. The incorporated DNA isjoined to the genomic DNA by phosphodiester bonds, and, if the cellundergoes cell division, it will typically be replicated and inheritedby the cell's descendants, and is considered to be integrated into thegenome of the cell's descendants. The terms “integrated”, “inserted”,“introduced”, and “incorporated” into the genome of a cell may be usedinterchangeably herein.

An “effective amount” or “effective dose” of an agent (or compositioncontaining such agent) generally refers to the amount sufficient toachieve a desired biological and/or pharmacological effect, e.g., whencontacted with a cell in vitro or administered to a subject according toa selected administration form, route, and/or schedule. As will beappreciated by those of ordinary skill in the art, the absolute amountof a particular agent or composition that is effective may varydepending on such factors as the desired biological or pharmacologicalendpoint, the agent to be delivered, the target tissue, etc. Those ofordinary skill in the art will further understand that an “effectiveamount” may be contacted with cells or administered in a single dose, orthrough use of multiple doses, in various embodiments. It will beunderstood that any agents, nucleic acid constructs, compounds, andcompositions herein may be employed in an amount effective to achieve adesired biological and/or therapeutic effect.

The term “matched cells” typically refers to cells of the same speciesand cell type as particular cells of interest, or to comparable cellsknown to have similar properties with respect to DNA methylation of theDNA region(s) under consideration. Matched cells may be of the samedevelopmental stage and/or differentiation state as cells of interest.Any method or experiment that includes manipulating a cell (e.g.,exposing a cell to an agent) may include a comparison with matched cellsas controls that are not so manipulated.

The term “promoter” refers to a regulatory region of DNA that directstranscription of a nucleic acid (the process by which RNA is synthesizedusing DNA as a template). A promoter for a particular gene is typicallylocated within the region extending from up to about 2 kilobases (kb)upstream from the transcription start site (TSS) for that gene up toabout 500 bp downstream from the TSS. A promoter contains DNA sequenceswith which general transcription factors and RNA polymerase associate toform a transcription pre-initiation complex near the transcription startsite and typically also contains one or more binding sites foradditional transcription factor(s). A promoter that comprises a variantor fragment of a naturally occurring promoter region may be said to be“derived from” the naturally occurring promoter. Mammalian promoters canbe generally classified into those that contain a TATA box, those thatare CpG enriched (e.g., contain a CpG island), and those that containboth a TATA box and are CpG enriched. A “constitutive” or “ubiquitous”promoter is one that is active (“on”) in most cells (in the case of amulticellular organism), cell states, and under most environmentalconditions. Promoters that are not constitutive may be cell typespecific or tissue-specific (active in particular cell types or tissuesbut inactive (“off”) in others) or cell state specific (active in cellsin particular states but inactive in other cells), may be subject todevelopmental regulation (active during one or more stages ofdevelopment but not in others), may be active only during particularstages of a biological process such as cell division, and/or may besubject to environmental regulation. An “inducible” promoter is onewhose activity can be regulated by an environmental condition such asthe presence or absence of a particular substance, temperature, etc.

The term “minimal promoter” refers to the smallest portion of a promoterthat has the ability to drive transcription at a detectable level. Forpurposes of the present disclosure, a “minimal promoter” may contain upto an additional 50, 100, or 200 bp of sequence flanking either or bothsides of this smallest portion. For example, if the smallest portion ofa naturally occurring promoter that has the ability to drivetranscription at a detectable level extends from −100 to +50 (with +1representing the TSS), then a minimal promoter may comprise a sequencethat extends from −300 to +250. In some embodiments a minimal promoteris able to drive transcription at a level at least 50%, 60%, 70%, 80%,or 90% of the level of a naturally occurring promoter region from whichit is derived, e.g., between about 50% and about 75% or between about75% and about 100% of the level of the promoter from which it isderived, when measured under the same or comparable conditions using thesame or a comparable assay. In some embodiments, a minimal promoter ischaracterized in that removal of at least 50 nt, or in some embodimentsremoval of at least 100 nt, or in some embodiments removal of at least200 nt, from either or both ends, would markedly reduce the level oftranscription, e.g., by at least 50%, or at least 75%.

The term “promoter region” refers to a region of genomic DNA thatextends from 2.5 kb upstream of the transcriptional start site (TSS) ofa gene to 500 bp downstream of the TSS, i.e., from position −2500 toposition +500 relative to the TSS (defined as position +1).

The term “enhancer” refers to a region of genomic DNA to which proteins(e.g., transcription factors) bind to enhance (increase) transcriptionof a gene. Enhancers may be located some distance away from thepromoters and transcription start site (TSS) of genes whosetranscription they regulate and may be located upstream or downstream ofthe TSS. Enhancers can be identified using methods known to those ofordinary skill in the art based on one or more characteristicproperties. For example, H3K27Ac is a histone modification associatedwith active enhancers (Creyghton et al., 2010b; Rada-Iglesias et al.,2010). In some embodiments enhancers are identified as regions ofgenomic DNA that when present in a cell show enrichment for acetylatedH3K27 (H3K27Ac), enrichment for methylated H3K4 (H3K4me1), or both.Enhancers can additionally or alternately be identified as regions ofgenomic DNA that when present in a cell are enriched for occupancy bytranscription factors. Histone modifications can be detected usingchromatin immunoprecipitation (ChIP) followed by microarrayhybridization (ChIP-Chip) or followed by sequencing (ChIP-Seq) or othermethods known in the art. These methods may also or alternately be usedto detect occupancy of genomic DNA by transcription factors (or otherproteins). A peak-finding algorithm such as that implemented in MACSversion 1.4.2 (model-based analysis of ChIP-seq) or subsequent versionsthereof may be used to identify regions of ChIP-seq enrichment overbackground (Zhang, Y., et al. (2008). Genome Biol. 9, R137). In someembodiments a p-value threshold of enrichment of 10⁻⁹ may be used.

The term “superenhancer” refers to a region of genomic DNA that containsat least two enhancers, e.g., a cluster of enhancers, wherein thegenomic region is occupied when present within a cell by moretranscriptional coactivator (e.g., Mediator) than the average singleenhancer within the cell. Super-enhancers are typically also enrichedfor occupancy by cell type specific transcription factors, includingmaster transcription factors and other genes that play key roles in cellidentity and can enhance the expression of such genes. Super-enhancerscan be identified and/or assigned to genes whose transcription isregulated by the superenhancer using methods known in the art. Occupancyof genomic DNA by Mediator, transcription factors, or other proteins canbe detected using ChIP-Chip, ChIP-Seq, or other methods known in theart. Numerous super-enhancers and their target genes have beenidentified. See, e.g., U.S. Patent Application Pub. Nos. 20140296218 and20140287932; Whyte et al., 2013; Hnisz et al., 2013; Lovén et al. (2013)Cell 153, 320-334) and/or PCT/US2013/066957 (WO/2014/066848). A catalogof super-enhancers, typical enhancers, and associated genes in 86 humansamples from a broad range of cell and tissue types, and description ofmethods used to identify them, is found in Hnisz et al. and inPCT/US2013/066957 (WO/2014/066848).

The term “transcription start site” (TSS) refers to the DNA nucleotideat which transcription of a RNA begins, i.e., the nucleotide that istranscribed to yield the first ribonucleotide in an RNA transcript. TSSsmay be defined based on RefSeq gene annotations.

The term “gene body” refers to the portion of a gene that istranscribed, from the transcription start site to the end of thetranscribed region.

The terms “enriched” or “enrichment” refer to the presence of somethingat a higher level in a first region or under a first condition than in asecond region or under a second condition with which it is compared. Ifa second location or condition is not specified, it should be assumedthat enrichment refers to the level in the first region or under thecondition relative to the background level of that thing in the settingin which it occurs. For example, a DNA region is considered “enriched”for a particular nucleotide or sequence motif or for a particularnucleic acid modification or histone modification if that nucleotide ormodification is present at a higher level within the region than in thegenome as a whole. Preferably the difference between the two levels isstatistically significant. In some embodiments, enrichment refers to anincrease by at least a factor of 2, 5, 10, 20, or 50-fold. In someembodiments enrichment is evident as a peak when the level of aparticular nucleic acid modification or other genomic feature ismeasured across the genome or a portion thereof.

The term “nucleic acid” refers to polynucleotides such asdeoxyribonucleic acid (DNA) and ribonucleic acid (RNA). The terms“nucleic acid” and “polynucleotide” are used interchangeably herein andshould be understood to include double-stranded polynucleotides,single-stranded (such as sense or antisense) polynucleotides, andpartially double-stranded polynucleotides. A nucleic acid oftencomprises standard nucleotides typically found in naturally occurringDNA or RNA (which can include modifications such as methylatednucleobases), joined by phosphodiester bonds. In some embodiments anucleic acid may comprise one or more non-standard nucleotides, whichmay be naturally occurring or non-naturally occurring (i.e., artificial;not found in nature) in various embodiments and/or may contain amodified sugar or modified backbone linkage. Nucleic acid modifications(e.g., base, sugar, and/or backbone modifications), non-standardnucleotides or nucleosides, etc., such as those known in the art asbeing useful in the context of RNA interference (RNAi), aptamer, CRISPRtechnology, polypeptide production, reprogramming, or antisense-basedmolecules for research or therapeutic purposes may be incorporated invarious embodiments. Such modifications may, for example, increasestability (e.g., by reducing sensitivity to cleavage by nucleases),decrease clearance in vivo, increase cell uptake, or confer otherproperties that improve the translation, potency, efficacy, specificity,or otherwise render the nucleic acid more suitable for an intended use.Various non-limiting examples of nucleic acid modifications aredescribed in, e.g., Deleavey G F, et al., Chemical modification ofsiRNA. Curr. Protoc. Nucleic Acid Chem. 2009; 39:16.3.1-16.3.22; Crooke,S T (ed.) Antisense drug technology: principles, strategies, andapplications, Boca Raton: CRC Press, 2008; Kurreck, J. (ed.) Therapeuticoligonucleotides, RSC biomolecular sciences. Cambridge: Royal Society ofChemistry, 2008; U.S. Pat. Nos. 4,469,863; 5,536,821; 5,541,306;5,637,683; 5,637,684; 5,700,922; 5,717,083; 5,719,262; 5,739,308;5,773,601; 5,886,165; 5,929, 226; 5,977,296; 6,140,482; 6,455,308 and/orin PCT application publications WO 00/56746 and WO 01/14398. Differentmodifications may be used in the two strands of a double-strandednucleic acid. A nucleic acid may be modified uniformly or on only aportion thereof and/or may contain multiple different modifications.Where the length of a nucleic acid or nucleic acid region is given interms of a number of nucleotides (nt) it should be understood that thenumber refers to the number of nucleotides in a single-stranded nucleicacid or in each strand of a double-stranded nucleic acid unlessotherwise indicated. An “oligonucleotide” is a relatively short nucleicacid, typically between about 5 and about 100 nt long.

The term “operably linked” refers to a nucleic acid regulatory elementand a nucleic acid sequence being appropriately positioned relative toeach other so as to place expression of the nucleic acid under theinfluence or control of the regulatory element(s). For example, apromoter and a nucleic acid are considered “operably linked” if they arepositioned in such a way in a DNA molecule that the promoter region iscapable of directing transcription of the nucleic acid under appropriateconditions. As used herein, “operably linked” refers to the positionalrelationship between the regulatory element(s) and the nucleic acidsequence as distinct from the activity level of the promoter. It will beunderstood that whether a particular promoter does in fact directtranscription of an operably linked nucleic acid molecule, and the levelof transcription, may depend on a variety of factors, such as thepresence or absence of appropriate transcription factors and/or thepresence or absence of inhibitory substances or other factors that mayaffect the activity of the promoter.

The term “pluripotent” refers to a cell that has the ability toself-renew and to differentiate into cells of all three embryonic germlayers (endoderm, mesoderm and ectoderm) and, typically, has thepotential to divide in vitro for a long period of time, e.g., at least20, at least 25, or at least 30 passages, or more (e.g., up to 80passages, or up to 1 year, or more), without losing its self-renewal anddifferentiation properties. A pluripotent cell is said to exhibit or bein a “pluripotent state”. A pluripotent cell line or cell culture isoften characterized in that the cells can differentiate into a widevariety of cell types in vitro and in vivo. Cells that are able to formteratomas containing cells having characteristics of endoderm, mesoderm,and ectoderm when injected into SCID mice are considered pluripotent.Cells that possess ability to participate in formation of chimeras (uponinjection into a blastocyst of the same species that is transferred to asuitable foster mother of the same species) that survive to term arepluripotent. If the germ line of the chimeric animal contains cellsderived from the introduced cell, the cell is consideredgermline-competent in addition to being pluripotent. Pluripotent cells(also referred to as pluripotent stem cells) include embryonic stem (ES)cells and induced pluripotent stem (iPS) cells. Embryonic stem cells arepluripotent stem cells that are derived directly from an embryo, e.g.,from a single blastomere, morula or from the inner cell mass ofblastocyst, or by somatic cell nuclear transfer (SCNT). Those ofordinary skill in the art are aware of suitable methods for derivingmammalian ES cells from mice, rats, humans, non-human primates, andother mammalian species. See Behringer, R, et al., Manipulating theMouse Embryo, A Laboratory Manual, 4^(th) ed., Cold Spring HarborLaboratory Press, Cold Spring Harbor, N.Y., 2013 for exemplarytechniques for deriving murine ES cells. Exemplary techniques forderiving primate ES cells are found in U.S. Pat. No. 6,200,806; Turksen,K. (ed.), Methods in Molecular Biology, Vo. 331 Humana Press, Inc.Totowa, N H, 2006, PCT/US2011/000850 (WO/2011/142832); and Zaninovic N,et al., Methods Mol Biol. 2014; 1154:121-44.

The term “polypeptide” refers to a polymer of amino acids linked bypeptide bonds. A protein is a molecule comprising one or morepolypeptides. A peptide is a relatively short polypeptide, typicallybetween about 2 and 100 amino acids (aa) in length, e.g., between 4 and60 aa; between 8 and 40 aa; between 10 and 30 an. The terms “protein”,“polypeptide”, and “peptide” may be used interchangeably. In general, apolypeptide may contain only standard amino acids or may comprise one ormore non-standard amino acids (which may be naturally occurring ornon-naturally occurring amino acids) and/or amino acid analogs invarious embodiments. A “standard amino acid” is any of the 20 L-aminoacids that are commonly utilized in the synthesis of proteins by mammalsand are encoded by the genetic code. A “non-standard amino acid” is anamino acid that is not commonly utilized in the synthesis of proteins bymammals. Non-standard amino acids include naturally occurring aminoacids (other than the 20 standard amino acids) and non-naturallyoccurring amino acids. An amino acid, e.g., one or more of the aminoacids in a polypeptide, may be modified, for example, by addition, e.g.,covalent linkage, of a moiety such as an alkyl group, an alkanoyl group,a carbohydrate group, a phosphate group, a lipid, a polysaccharide, ahalogen, a linker for conjugation, a protecting group, a small molecule(such as a fluorophore), etc.

The terms “purified” may be used herein to refer to an isolated nucleicacid or polypeptide that is present in the substantial absence of otherbiological macromolecules, e.g., other nucleic acids and/orpolypeptides. In some embodiments a purified nucleic acid (or nucleicacids) is substantially separated from cellular polypeptides. In someembodiments, the ratio of nucleic acid to polypeptide is at least 5:1 orat least 10:1 by dry weight. In some embodiments a purified polypeptideis separated from cellular nucleic acids. In some embodiments, the ratioof nucleic acid to polypeptide is at least 5:1 or at least 10:1 by dryweight. In some embodiments, a nucleic acid or polypeptide is purifiedsuch that it constitutes at least 75%, 80%, 85%, or 90% by weight, e.g.,at least 95% by weight, e.g., at least 99% by weight, or more, of thetotal nucleic acid or polypeptide material present. In some embodiments,water, buffers, ions, and/or small molecules (e.g., precursors such asnucleotides or amino acids), can optionally be present in a purifiedpreparation. A purified molecule may be prepared by separating it fromother substances (e.g., other cellular materials), or by producing it insuch a manner to achieve purity. In some embodiments, a purifiedmolecule or composition refers to a molecule or composition comprisingone or more molecules that is prepared using any art-accepted method ofpurification.

As used herein, two regions or positions (or a region and a position)within a DNA molecule (e.g., a chromosome) are said to be “in proximityto” each other if the distance between them in terms of nucleotides(i.e., the length of any intervening DNA between them) is no more than20 kb. In some embodiments the distance is no more than 10 kb, no morethan 5 kb, no more than 2 kb, no more than 1 kb, no more than 500 nt, nomore than 250 nt, no more than 100 nt, no more than 50 nt, no more than25 nt, no more than 10 nt, no more than 5 nt, or 0 nt (i.e., theregions, positions, or region and position are directly adjacent to eachother). If a first nucleic acid is integrated into a particular regionof DNA in the genome, the nucleic acid is said to be in proximity to theregion of DNA, and vice versa.

The term “reporter molecule” refers to a molecule that can be used as anindicator of the occurrence or level of a particular biological process,activity, event, or state in a cell or organism. Reporter moleculestypically have one or more properties or enzymatic activities that allowthem to be readily measured or that allow selection of a cell thatexpresses the reporter molecule. In general, a cell can be assayed forthe presence of a reporter molecule by measuring the reporter moleculeitself or an enzymatic activity of the reporter protein. Detectablecharacteristics or activities that a reporter molecule may have include,e.g., fluorescence, bioluminescence, ability to catalyze a reaction thatproduces a fluorescent or colored substance in the presence of asuitable substrate, or other readouts based on emission and/orabsorption of photons (light). Typically, a reporter molecule is amolecule that is not endogenously expressed by a cell or organism inwhich the reporter molecule is used.

The term “reporter gene” refers to a nucleic acid that encodes areporter molecule. A reporter gene can be operably linked to a promotersequence to produce a reporter construct that can be used to assay forthe transcriptional activity of the promoter in a cell. The reporterconstruct may be assembled in or inserted into a vector. The reporterconstruct or vector may be transferred into one or more cells. Aftertransfer, cells are assayed for the presence of the reporter molecule bymeasuring the reporter molecule or the activity (e.g., enzymaticactivity) of the reporter molecule. In some embodiments, a reporter geneis codon-optimized for expression in mammalian cells.

The term “reprogramming” refers to a process that alters thedifferentiation state of a somatic cell to a less differentiated stateor that converts a somatic cell from one cell type to a different celltype, Reprogramming that converts a cell of a first differentiated celltype to a cell of a second differentiated cell type without undergoingan intermediate pluripotent state is sometimes referred to as“transdifferentiation” or “direct reprogramming”. In some embodiments,reprogramming comprises altering the differentiation state of a somaticcell to a pluripotent state. The resulting pluripotent cell is sometimesreferred to as an “induced pluripotent stem cell” (iPS cell). Those ofordinary skill in the art are aware of suitable in vitro methods forreprogramming, e.g., for deriving iPS cells from mammalian somatic cellsof diverse species, e.g., mice, rats, humans, non-human primates, andother mammalian species. In general, embryonic, fetal, or adult somaticcells may be used. In general, any type of somatic cell may be used,such as fibroblasts, keratinocytes, peripheral mononuclear cells, toname a few. See Behringer, R, et al., Manipulating the Mouse Embryo, ALaboratory Manual, 4^(th) ed., Cold Spring Harbor Laboratory Press, ColdSpring Harbor, N.Y., 2013; US Pat. Pub. Nos. 20110076678 and 20120028821for exemplary techniques for generating iPS cells. In general, suitablemethods can include causing a somatic cell to express appropriatepluripotency-associated genes, e.g., genes that encodepluripotency-associated transcription factors (TFs). Examples of TFsthat can be used to generate iPS cells include Oct4, Klf4 (or other Klffamily members such as Klf2 or Klf5), Sox2 (or other Sox family memberssuch as Sox1 or Sox3), Nanog, Lin28, and Myc (c-Myc, L-Myc, N-Myc). Asingle factor may be expressed or two, three, four, or more of thesefactors may be expressed in various combinations (e.g., Oct4, Klf4, andSox2; Oct4, Klf4, Sox2, and Myc; or Oct4, Sox2, Nanog, and Lin28) asknown in the art. In some embodiments microRNAs may be used ingenerating iPS cells. For example, miR-302, miR-367, miR-200c, ormiR-369s may be used. In some embodiments inhibition of p53 by RNAi(e.g., using a shRNA cassette that encodes a shRNA that inhibits p53expression) may be combined with expression of one or more reprogrammingfactors Expression may be achieved by a variety of methods. One or morevectors comprising expression cassettes encoding the factors (which maybecome integrated into the genome or may be extrachromosomal elementssuch as episomes derived from Epstein-Barr virus (e.g., as described inYu, J., et al., Science. (2009) 324(5928):797-801) or translatable mRNA(e.g., synthetic modified stabilized mRNA (e.g., as described in Warrenet al. (Cell Stem Cell 7(5):618-30, 2010, Mandal P K, Rossi D J. NatProtoc. 2013 8(3):568-82, US Pat. Pub. No. 20120046346 and/orPCT/US2011/032679 (WO/2011/130624) encoding the factors may beintroduced into cells, e.g., by transfection. Transdifferentiation of acell from a first cell type to a second cell type can be performed byectopically expressing one or more lineage-specific transcriptionfactors, e.g., master transcription factors, of the second cell type inthe cell of the first cell type. For example, expressing the bHLHtranscription factor MyoD in fibroblasts can transform them intomyoblasts by activating muscle-specific genes. Direct reprogramming offibroblasts and other cell types to neurons, cardiomyocytes,hepatocytes, skeletal muscle cells, and other cell types has beenachieved. See Morriss, S A and Daley, G Q, Cell Research (2013) 23:33-48for review. For example, cells have been directly reprogrammed intoβ-islet cells, cardiomyocytes, and neurons by using NPM (Ngn3, Pdx1, andMafa), GMT (GATA4, MEF2C, and TBX5), and ABM (Ascl1, Brn2, and Myt11),respectively. As known in the art various small molecules such ashistone deacetylase inhibitors (HDACs) or molecules that act on varioussignaling pathways can enhance reprogramming (e.g., increasereprogramming efficiency) and/or replace one or more of thetranscription factors. It will be understood that many differentreprogramming factors, small molecules, and combinations thereof havebeen successfully used for reprogramming. In some embodiments cells tobe reprogrammed harbor genes encoding one or more reprogramming factorsunder control of an inducible promoter. Reprogramming may be performedby placing the cells under inducing conditions, e.g., contacting thecells with a suitable inducing agent. In some embodiments areprogramming method that avoids integration of exogenous DNA into thegenome may be used. In some embodiments cells to be reprogrammed areobtained from a non-human animal that harbors one or more transgenescomprising a reprogramming factor operably linked to an induciblepromoter.

The term “selectable marker” or “selectable marker gene” refers to anucleic acid that encodes an RNA or protein that confers on a cell anincreased ability to survive and/or proliferate under particularconditions (“selective conditions”) relative to cells that lack or donot express the selectable marker. In some embodiments the selectablemarker allows the cell to survive or proliferate under selectiveconditions that, absent the selectable marker, would ordinarily causethe cell to die or cease proliferating. The particular selectiveconditions may be the presence of an ordinarily toxic substance in theculture medium or an insufficient amount of particular nutrient(s) thatare required by the cell for survival or proliferation. Those ofordinary skill in the art are aware of suitable selectable markers ofuse in cells of interest, e.g., bacterial or mammalian cells. Antibioticresistance markers are a non-limiting example of a class of selectablemarker. A selectable marker of this type that is commonly used inmammalian cells is the neomycin resistance gene (an aminoglycoside3′-phosphotransferase, 3′ APH II). Expression of this selectable markerrenders cells resistant to various antibiotics such as G418. Additionalantibiotic resistance markers encode enzymes conferring resistance toZeocin™, hygromycin, puromycin, blasticidin, gentamicin, kanamycin, etc.A second non-limiting class of selectable markers is nutritionalmarkers. Such selectable markers generally encode enzymes that functionin a biosynthetic pathway to produce a compound that is needed for cellproliferation or survival. In general, under nonselective conditions thecompound is present in the environment or is produced by an alternativepathway in the cell. Under selective conditions, functioning of thebiosynthetic pathway in which the selectable marker is involved isneeded to produce the compound.

The term “site-specific recombinase” (also referred to simply as a“recombinase” herein) refers to a protein that can recognize andcatalyze the recombination of DNA between specific sequences in a DNAmolecule. Such sequences may be referred to as “recombination sequences”or “recombination sites” for that particular recombinase. Tyrosinerecombinases and serine recombinases are the two main families ofsite-specific recombinase. Examples of site-specific recombinase systemsinclude the Cre/Lox system (Cre recombinase mediates recombinationbetween loxP), the Flp/Frt system (Flp recombinase mediatesrecombination between FRT sites), and the PhiC31 system (PhiC31recombinase mediates DNA recombination at sequences known as attB andattP sites). Recombinasc systems similar to Cre include the Dre-rox,VCre/VloxP, and SCre/SloxP systems (Anastassiadis K, et al. (2009) DisModel Mech 2(9-10):508-515; Suzuki E, Nakayama M (2011) Nucl. Acids Res.(2011) 39 (8): e49. It should be understood that reference to aparticular recombinase system is intended to encompass the variousengineered and mutant forms of the recombinases and recombination sitesand codon-optimized forms of the coding sequences known in the art.Site-specific recombinases can be used to delete or invert DNA locatedbetween the recombinase sites or mediate integration. For example,inverted Lox sites on the same chromosome will cause an inversion of theintervening DNA, while a direct repeat of Lox sites (Lox sites in thesame orientation) will cause deletion of the intervening DNA. DNA placedbetween two loxP sites is said to be “floxed”. A gene may be modified bythe insertion of two loxP sites that allow the excision of the floxedgene segment through Cre-mediated recombination. In some embodiments,expression of Cre may be under control of a cell type specific, cellstate specific, or inducible expression control element (e.g., cell typespecific, cell state specific, or inducible promoter) or Cre activitymay be regulated by a small molecule. For example, Cre may be fused to aligand binding domain of a receptor (e.g., a steroid hormone receptor)so that its activity is regulated by receptor ligands. Cre-ER(T) orCre-ER(T2) recombinases may be used, which comprise a fusion proteinbetween a mutated ligand binding domain of the human estrogen receptor(ER) and Cre, the activity of which can be induced by, e.g.,4-hydroxy-tamoxifen. Placing Lox sequences appropriately allows avariety of genomic manipulations. For example, genes can be activated orrepressed.

The term “safe harbor” locus refers to an intragenic or extragenicregion of the mammalian genome that is able to accommodate thepredictable expression of newly integrated DNA without adverse effectson the host cell (or on an animal whose cells harbour the integratedDNA). In some embodiments the safe harbour locus is the AAVSV1 (thenatural integration site for the wild-type AAV on chromosome 19),ROSA26, or CCR5 locus. The locations of these loci are well known in theart.

The term “small molecule” as used herein, refers to an organic moleculethat is less than about 2 kilodaltons (kDa) in mass. In someembodiments, the small molecule is less than about 1.5 kDa, or less thanabout 1 kDa. In some embodiments, the small molecule is less than about800 daltons (Da), 600 Da, 500 Da, 400 Da, 300 Da, 200 Da, or 100 Da.Often, a small molecule has a mass of at least 50 Da. In someembodiments, a small molecule is non-polymeric. In some embodiments, asmall molecule is not an amino acid. In some embodiments, a smallmolecule is not a nucleotide. In some embodiments, a small molecule isnot a saccharide. In some embodiments, a small molecule containsmultiple carbon-carbon bonds and can comprise one or more heteroatomsand/or one or more functional groups important for structuralinteraction with proteins (e.g., hydrogen bonding), e.g., an amine,carbonyl, hydroxyl, or carboxyl group, and in some embodiments at leasttwo functional groups. Small molecules often comprise one or more cycliccarbon or heterocyclic structures and/or aromatic or polyaromaticstructures, optionally substituted with one or more of the abovefunctional groups.

A “subject” may be any vertebrate organism in various embodiments. Insome embodiments a subject is a mammal, e.g., a human, non-humanprimate, rodent (e.g., mouse, rat, hamster), rabbit, ungulate (e.g.,ovine, bovine, equine, caprine species), canine, or feline. A subjectmay be individual to whom an agent is administered, e.g., forexperimental, diagnostic, and/or therapeutic purposes or from whom abiological sample (e.g., a sample containing one or more cells) isobtained.

The term “targetable nuclease” refers to a nuclease that can beprogrammed to produce site-specific DNA breaks, e.g., double-strandedbreaks (DSBs), at a selected site in DNA. Such a site may be referred toas a “target site”. The target site can be selected by appropriatedesign of the targetable nuclease or by providing a guide molecule(e.g., a guide RNA) directs the nuclease to the target site. Examples oftargetable nucleases include zinc finger nucleases (ZFNs), transcriptionactivator-like effector nucleases (TALENs), and RNA-guided nucleases(RGNs) such as the Cas proteins of the CRISPR/Cas Type II system, andengineered meganucleases.

A “variant” of a particular polypeptide or polynucleotide has one ormore alterations (e.g., additions, substitutions, and/or deletions) withrespect to the polypeptide or polynucleotide, which may be referred toas the “original polypeptide” or “original polynucleotide”,respectively. An addition may be an insertion or may be at eitherterminus. A variant may be shorter or longer than the originalpolypeptide or polynucleotide. The term “variant” encompasses“fragments”. A “fragment” is a continuous portion of a polypeptide orpolynucleotide that is shorter than the original polypeptide. In someembodiments a variant comprises or consists of a fragment. In someembodiments a fragment or variant is at least 20%, 30%, 40%, 50%, 60%,70%, 80%, 90%, 95%, 96%, 97%, 98%, 99%, or more as long as the originalpolypeptide or polynucleotide. A fragment may be an N-terminal,C-terminal, or internal fragment. In some embodiments a variantpolypeptide comprises or consists of at least one domain of an originalpolypeptide. In some embodiments a variant polynucleotide hybridizes toan original polynucleotide under stringent conditions, e.g., highstringency conditions, for sequences of the length of the originalpolypeptide. In some embodiments a variant polypeptide or polynucleotidecomprises or consists of a polypeptide or polynucleotide that is atleast 50%, 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99%, or moreidentical in sequence to the original polypeptide or polynucleotide overat least 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%,99%, or 100% of the original polypeptide or polynucleotide. In someembodiments a variant polypeptide comprises or consists of a polypeptidethat is at least 50%, 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99%, ormore identical in sequence to the original polypeptide over at least20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99%, or 100%of the original polypeptide, with the proviso that, for purposes ofcomputing percent identity, a conservative amino acid substitution isconsidered identical to the amino acid it replaces. In some embodimentsa variant polypeptide comprises or consists of a polypeptide that is atleast 50%, 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99%, or moreidentical to the original polypeptide over at least 20%, 30%, 40%, 50%,60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% of the originalpolypeptide, with the proviso that any one or more amino acidsubstitutions (up to the total number of such substitutions) may berestricted to conservative substitutions. In some embodiments a percentidentity is measured over at least 100; 200; 300; 400; 500; 600; 700;800; 900; 1,000; 1,200; 1,500; 2,000; 2,500; 3,000; 3,500; 4,000; 4,500;or 5,000 amino acids. In some embodiments the sequence of a variantpolypeptide comprises or consists of a sequence that has N amino aciddifferences with respect to an original sequence, wherein N is anyinteger between 1 and 10 or between 1 and 20 or any integer up to 1%,2%, 5%, or 10% of the number of amino acids in the original polypeptide,where an “amino acid difference” refers to a substitution, insertion, ordeletion of an amino acid. In some embodiments a difference is aconservative substitution. Conservative substitutions may be made, e.g.,on the basis of similarity in side chain size, polarity, charge,solubility, hydrophobicity, hydrophilicity and/or the amphipathic natureof the residues involved. For example, non-polar (hydrophobic) aminoacids include alanine, leucine, isoleucine, valine, proline, tryptophan,and methionine; polar/neutral amino acids include glycine, serine,threonine, cysteine, tyrosine, asparagine, and glutamine; positivelycharged (basic) amino acids include arginine, lysine, and histidine; andnegatively charged (acidic) amino acids include aspartic acid andglutamic acid. It should be understood that the use of functionalvariants of any of the nucleic acids and/or polypeptides describedherein is within the scope of the present disclosure. In someembodiments a variant is a functional variant, i.e., the variant atleast in part retains at least one activity of interest of the originalpolypeptide or polynucleotide. An activity of interest may be anyactivity that is useful in a composition or a method described herein.An activity may be, e.g., fluorescence, catalytic activity (e.g.,luciferase activity, cleavage activity), binding activity, ability toperform or participate in a biological function or process, etc. In someembodiments a variant may have an activity of at least 10%, 20%, 30%,40%, 50%, 60%, 70%, 80%, 90%, 95%, or more, of the activity of theoriginal polypeptide or polynucleotide, up to approximately 100%, 125%,150%, 200%, 500%, 1000%, or more of the activity of the originalpolypeptide or polynucleotide, in various embodiments. In someembodiments a variant may have a qualitatively different activity to thepolynucleotide or polypeptide from which it is derived. In someembodiments a variant, e.g., a functional variant, comprises or consistsof a polypeptide at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%,99%. 99.5% or 100% identical to an original polypeptide orpolynucleotide over at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%,98%, or 99% or 100% of the original polypeptide or polynucleotide. Insome embodiments a variant may have reduced activity with respect to oneor more activities that may be detrimental or undesirable in the contextof a composition or method described herein, while retaining one or moreactivities that is useful or desirable in a composition or methoddescribed herein. In some embodiments an alteration, e.g., asubstitution or deletion, e.g., in a functional variant, does not alteror delete an amino acid or nucleotide that is known or predicted to beimportant for an activity, e.g., a known or predicted catalytic residueor residue involved in binding a substrate or cofactor. In someembodiments nucleotide(s), amino acid(s), or region(s) exhibiting lowerdegrees of conservation across species as compared with other aminoacids or regions may be selected for alteration. Variants may be testedin one or more suitable assays to assess activity.

The term “vector” as used herein refers to a nucleic acid or a virus orportion thereof (e.g., a viral capsid or genome) capable of mediatingentry of, e.g., transferring, transporting, etc., a nucleic acid into acell. Where the vector is a nucleic acid, the nucleic acid to betransferred is generally linked to, e.g., present in, the vector. Anucleic acid vector may include sequences that direct autonomousreplication (e.g., an origin of replication) and/or may includesequences sufficient to allow integration of part or all of the nucleicacid into host cell genomic DNA. Useful nucleic acid vectors include,for example, naturally occurring or modified viral genomes or portionsthereof or nucleic acids (DNA or RNA) that can be packaged into viralcapsids, DNA or RNA plasmids, and transposons. Plasmid vectors typicallyinclude an origin of replication and may include one or more selectablemarker genes. Plasmids may comprise part or all of a viral genome (e.g.,a viral promoter, enhancer, processing or packaging signals, etc.).Viruses or portions thereof that can be used to introduce nucleic acidmolecules into cells are referred to as viral vectors. Useful viralvectors include adenoviruses, adeno-associated viruses, retroviruses,lentiviruses, vaccinia virus and other poxviruses, herpesviruses (e.g.,herpes simplex virus), and others. In some embodiments a virus havingtropism for a particular cell type (e.g., neurons or a particular typeof neuron) may be used. Examples of expression vectors that may be usedin mammalian cells include, e.g., the pcDNA vector series, pSV2 vectorseries, pCMV vector series, pRSV vector series, pEF1 vector series,Gateway® vectors, etc. Useful transposons include, e.g., Tol2, Minos,Sleeping Beauty (SB) and PiggyBac (PB). One of ordinary skill in the artappreciates how to use a viral vector, plasmid, transposon system, orother vector to introduce a DNA sequence of interest into the genome ofa cell. For example, it would be understood that a transposase would besupplied to the cell if a transposon vector is used.

The term “CpG island” (CGI) refers to a region of genomic DNA that hasan elevated G+C content (proportion of nucleotides that are either G orC) as compared with the mammalian genome as a whole, in which CpGdinucleotides are underrepresented. In vertebrates, CpG islands areenriched in certain regions of the genome involved in initiation of genetranscription, such as promoters. CGIs colocalize with the majority ofannotated gene promoters in both the human and mouse genomes, includingmost housekeeping genes and a number of tissue-specific genes anddevelopmental regulator genes. A promoter that contains, is containedin, or overlaps with a CGI may be referred to as “CGI promoter”. Such aCGI is said to be associated with or colocalized with the promoter, andvice versa. CGIs frequently exist in an unmethylated state that istranscriptionally permissive and marked by histone modifications thatare characteristic of transcriptionally active chromatin such as histoneacetylation (H3/H4Ac) and H3K4me3. While often unmethylated in normalcells, CGIs can become methylated under certain conditions and incertain tissues. DNA methylation of CGIs is associated with stablelong-term silencing of CGI promoters.

In some aspects, CGIs are identified as regions of genomic DNA at least200 bp in length that have a G+C content of at least 50% and a CpGfrequency (observed/expected) of at least 0.6 (Gardiner-Garden, M. andFrommer, M. (1987) CpG islands in vertebrate genomes. J. Mol. Biol. 196,261-282). The observed to expected (O/E) ratio in a given DNA segmentcan be calculated by dividing the proportion of CpG dinucleotides in thesegment by what is expected by chance, which can be calculated using theformula

${O/E} = \frac{\# {{CpG}/N}}{\# {C/N} \times \# {G/N}}$

where N is the number of base pairs (bp) in the segment. In someaspects, the CGI definition of Gardiner-Garden and Frommer (GFdefinition) is refined by excluding sequences that meet the abovecriteria but lie within or substantially overlap a repetitive sequencein the genome. Repetitive sequences includes those sequence elementsknown as LINEs, SINEs, and Alu sequences, which are well known in theart. In some embodiments a CGI does not comprise or consist of oroverlap with an Alu sequence or other repetitive sequence found in agenome of interest. In some embodiments a CGI is at least 300 bp, atleast 400 bp, or at least 500 bp long, e.g., between about 500 bp andabout 1 kb, between about 1 kb and about 2 kb, between about 2 kb andabout 5 kb, or between about 5 kb and about 10 kb long. Exclusion ofrepetitive sequences can be achieved by applying the criteria of the GFdefinition to a modified version of a genome in which repeats have beenmasked and are not considered for purposes of identifying sequences thatmeet the criteria. RepeatMasker is a computer program that screens DNAsequences for interspersed repeats and low complexity DNA sequences. Theoutput includes a modified version of the query sequence in which allthe annotated repeats have been masked (Smit, A F A, Hubley, R & Green,P. RepeatMasker Open-3.0.1996-2010; available on the worldwide web atsubdomain repeatmasker.org). WindowMasker (Morgulis A., et al.,Bioinformatics 2006; 22:134-141) and Tandem Repeats Finder (Benson G.Nucleic Acids Res. 1999; 27:573-580.) CGIs identified based on the GFdefinition applied to various vertebrate genomes (e.g., human, mouse)are available in the UCSC Genome Browser in the “CpG Islands” tracks.The UCSC Genome Browser provides the option to use either a masked orunmasked genome. “CpG island shores” are the regions extending 2 kb oneither side of a CpG island. These regions have a lower CpG density thando CGIs and are harbor numerous cancer-specific and tissue-specificdifferentially methylated regions. A “low CpG region” is a region thathas a lower CpG density than that found in CpG islands.

The term “differentially methylated region” (DMR) refers to a region ofgenomic DNA that is differentially marked by DNA methylation (has adifferent methylation pattern) in two or more settings. Unless otherwisespecified, the term “differentially methylated region” as used hereinrefers to a region of genomic DNA that is differentially methylated intwo homologous parental chromosomes present in a cell, i.e., the regionis differentially methylated in a parent-of-origin specific manner. Inother words, the methylation level within the region differs dependingon whether it is in the paternal or maternal chromosome. Such a DMR maybe referred to as a parent-of-origin DMR. The term “differentiallymethylated region” may sometimes be used to refer to a region of genomicDNA that is differentially marked by methylation in two or more settingson either or both chromosomes in a way that is not determined by theparental origin of the chromosomes. Where this use is intended herein,the term “differentially methylated region” will be immediately precededby a word or phrase that contains the term “specific” and refers to thesettings in which the region is differentially methylated. The two ormore settings may, for example, be two or more cell types or cellstates. For example, a “tissue-specific DMR” refers to a region ofgenomic DNA that is differentially marked by methylation in two or moredifferent tissues or cell types. A “disease-specific DMR” refers to aregion of genomic DNA that is differentially marked by methylation intissues or cells affected by a disease as compared with tissues or cellsthat are otherwise matched but are not affected by a disease (normaltissues or cells). A “reprogramming-specific DMR” refers to a region ofgenomic DNA that is differentially marked by methylation in reprogrammedcells (i.e., cells that are undergoing or result from reprogramming) ascompared with the original cells.

A “germline differentially methylated region” (gDMR) is a DMR thatbecomes differentially methylated in the germline. Thus, gDMRs arealready differentially methylated in the gametes at the time offertilization. Some gDMRs are methylated during oogenesis while theothers are methylated during spermatogenesis. Therefore, in a givendiploid cell or organism, certain gDMRs (those methylated duringoogenesis) are methylated on the maternally inherited chromosome, andcertain gDMRs (those methylated during spermatogenesis) are methylatedon the paternally inherited chromosome.

A “secondary differentially methylated region”, also referred to as a“somatic differentially methylated region” is a differentiallymethylated region that becomes differentially methylated afterfertilization. Secondary DMRs are subsequently maintained throughoutnormal development, and are therefore not regulated by the DNAmethylation machinery in a tissue-specific manner.

The present disclosure encompasses the recognition that studies ofepigenetic changes such as DNA methylation have heretofore been hamperedby two experimental constraints that limit mechanistic studies ofmethylation and gene regulation. Changes in DNA methylation duringprocesses such as development, lineage commitment, and disease aredynamic. One limitation of standard methods for methylation analysis(i.e., methods used in the art for methylation analysis prior to thepresent disclosure) is that it provides only a static “snapshot” view ofthe methylation state during cell state transitions. Prior to thepresent disclosure, following the dynamics of DNA methylation has beenhindered by the inability to translate epigenetic changes into atraceable readout. Another limitation of standard methods formethylation analysis is that they are based on examining bulkpopulations of cells, precluding assessment of methylation changes inindividual cells.

Described herein is a DNA methylation reporter (also referred to as aReporter of Genomic Methylation (RGM) or “RGM construct”) that permitsdetection of genomic methylation states in individual cells. In someaspects, a DNA methylation reporter described herein allows the tracingof real-time changes in DNA methylation in live cells. The DNAmethylation reporter comprises a promoter that, when introduced into DNAin proximity to a region of interest (e.g., a region comprising CpGdinucleotides), may be utilized to report on methylation changes of theadjacent sequences.

The design of the DNA methylation reporter is based at least in part onthe insight that a promoter useful for reporting on methylation of a DNAregion of interest should preferably be one whose activity (i.e.,activity with regard to directing (driving) transcription of an operablylinked DNA sequence) is sensitive to exogenous methylation changes(i.e., methylation changes outside of the promoter itself) without beingindependently regulated by the DNA methylation machinery. In otherwords, the activity of the promoter can be affected by exogenousmethylation changes but should not ordinarily be subject to regulationby methylation during the processes of development or cellulardifferentiation. The DNA methylation reporter described herein comprisesa promoter whose activity can be affected by exogenous methylationchanges without being independently regulated by the DNA methylationmachinery. Such a promoter may be referred to herein as an “RGMpromoter”. An RGM construct comprises an RGM promoter operably linked toa nucleic acid sequence that encodes a reporter molecule. In general,the RGM promoter is located upstream of (i.e., in the 5′ direction from)the sequence that encodes the reporter molecule. In some aspects,described herein is the identification of suitable promoters and theiruse as sensors for DNA methylation of a DNA region of interest.

In order to use an RGM construct to measure methylation of a DNA regionof interest, the RGM construct is positioned in proximity to the DNAregion of interest in a cell. For example, the RGM construct may beintegrated into a region of interest in the genome of the cell. The cellis subsequently assayed for the reporter molecule. Transcription of thereporter gene (the DNA sequence that encodes the reporter molecule) isdependent on activity of the RGM promoter, which is sensitive to thelevel of methylation of the region of interest. Activity of the RGMpromoter allows transcription of the reporter gene, producing RNA thatencodes the reporter molecule. The level of the reporter molecule servesas an indicator of the level of methylation of the region of interest.

Thus, in some aspects, described herein is a method of detecting themethylation state of a DNA region of interest in the genome of a cellcomprising: a) providing a cell comprising a nucleic acid comprising anRGM promoter operably linked to a nucleic acid sequence that encodes areporter molecule, wherein the nucleic acid is integrated in proximityto a region of interest in the genome of the cell; and b) measuringexpression of the reporter molecule by the one or more cells, whereinthe level of expression of the reporter molecule is indicative of thelevel of methylation of the region of interest, thereby detecting themethylation state of the region of interest. For example, in someembodiments, lack of expression of the reporter molecule is indicativeof methylation, e.g., hypermethylation, of the region of interest, whileexpression of the reporter molecule is indicative of low or absentmethylation of the region of interest.

In some embodiments, the RGM promoter is active if the region ofinterest is hypomethylated, thus allowing transcription of the reportergene. In some embodiments, methylation of the region of interestinhibits activity of the RGM promoter, thereby inhibiting transcriptionof the reporter gene. In some embodiments, if the cell is positive forthe reporter molecule, this indicates that the region of interest ishypomethylated. In some embodiments, if a cell is negative for thereporter molecule, this indicates that the region of interest ishypermethylated.

In some embodiments, the RGM promoter is inactive if the region ofinterest is hypomethylated, thus allowing transcription of the reportergene. In some embodiments, methylation of the region of interestincreases activity of the RGM promoter, thereby increasing transcriptionof the reporter gene. In some embodiments, if the cell is positive forthe reporter molecule, this indicates that the region of interest ishypermethylated. In some embodiments, if a cell is negative for thereporter molecule, this indicates that the region of interest ishypormethylated.

In some embodiments, a change in the level of the reporter moleculeindicates a change in the level of methylation of the region ofinterest. For example, in some embodiments, an increase in the level ofthe reporter molecule over a period of time indicates that the RGMpromoter has become more active and, therefore, that the region ofinterest has undergone a change in methylation (e.g., has become lessdensely methylated) during that period. In some embodiments a decreasein the level of the reporter molecule over a period of time indicatesthat the RGM promoter has become less active and, therefore, that theregion of interest has undergone a change in methylation (e.g., hasbecome more densely methylated) during that period. Without wishing tobe bound by any theory, it is believed that methylation may bepropagated from the DNA region of interest into the RGM promoter,resulting in modulation of its transcriptional activity.

Depending on the particular RGM promoter, methylation may increase ordecrease its transcription activity. As discussed further below, theSnrpn promoter is exemplified herein in detail as an RGM promoter. Inthe case of an RGM construct comprising a Snrpn promoter, methylation ofthe DNA region of interest reduces transcriptional activity, thusreducing production of the reporter molecule, and demethylation of theDNA region of interest increases transcriptional activity, thusincreasing production of the reporter molecule.

In some embodiments, an RGM construct is integrated into the genome of amammalian cell in proximity to a DNA region of interest (ROI) in thegenome of the cell. The RGM construct may be used to report onmethylation of the region of interest (i.e., to provide a measurableindication of the methylation state of the region of interest). The RGMconstruct may be integrated within the DNA region of interest or the 5′or 3′ end of the RGM construct may be directly adjacent to the DNAregion of interest or may be located up to about 5 nt, 10 nt, 50 nt, 100nt, 250 nt, 500 nt, 1 kb, 2 kb, 3 kb, 4 kb, 5 kb, 10 kb, or 20 kb fromthe nearest nucleotide of the DNA region of interest. In someembodiments the RGM construct is located 5′ with respect to the ROI. Insome embodiments the RGM construct is located 3′ with respect to theROI. In some embodiments an RGM construct is integrated at apredetermined location in the genome in proximity to a region ofinterest using any of a variety of methods for genome modification (seediscussion below). In some embodiments an RGM construct is integratedinto the genome at a random location, where “random” in this contextmeans that the location is not predetermined by the artisan. The RGMconstruct may then be used to report on methylation in a region of thegenome in proximity to the location at which it is inserted. If desired,the region of the genome or the location at which the RGM construct isintegrated may be identified, e.g., by sequencing.

In some embodiments, after integration of a nucleic acid comprising anRGM construct into the genome of a cell, DNA comprising the RGM promoterand, optionally, at least a portion of a DNA region of interest, may beisolated and its methylation state determined using standard methodologyfor methylation analysis. For example, the DNA may be subjected tobisulfite treatment, amplified (e.g., by PCR), and sequenced.Determining the methylation state of the RGM promoter and, optionally,the methylation state of at least a portion of a DNA region of interest,using standard methodology may be performed in order to confirm that theRGM is faithfully reporting on the methylation state of sequences in itsproximity.

While it is contemplated that a DNA region of interest whose methylationstate is measured using an RGM reporter will often be located in thegenome of a cell, in some embodiments a RGM construct may be used toreport on methylation of a region of interest in extrachromosomal DNA,such as a region of DNA in an episomal vector (e.g., an oriP/EBNA-1episome), minicircle DNA, or other type of extrachromosomal DNA. In someembodiments, an RGM construct is introduced into an extrachromosomal DNAelement prior to introduction of the DNA element into a cell. It shouldalso be understood that in embodiments in which a cell comprises two ormore reporter constructs or expression cassettes, any one or more ofsuch constructs may be integrated into the genome or may be in anepisome in various embodiments.

In some aspects, the disclosure is based in part on the discovery thatpromoters of imprinted genes (also referred to as “imprinted genepromoters”) can serve as methylation sensors and are suitable promotersfor use in a DNA methylation reporter. Imprinted gene promoters exhibitinherent sensitivity to DNA methylation of adjacent or nearby genomicregions, resulting in transcriptional activation or silencing of theimprinted gene. Methylation of a genomic region in proximity to animprinted gene promoter can lead to methylation of the imprinted genepromoter. Depending on the particular imprinted gene promoter,methylation can inhibit transcriptional activity of the promoter orincrease transcriptional activity of the promoter. This mechanism hasbeen established for a subgroup of germline-derived differentiallymethylated regions (DMRs) that act as imprinting control regions andaffect in cis the methylation state of secondary regulatory promoterelements, which in turn control imprinted gene promoter activity. Themethylation state of such promoter elements is subsequently maintainedthroughout normal development, and therefore not regulated by the DNAmethylation machinery in a tissue-specific manner. The presentdisclosure provides the insight that these characteristics of imprintedgene promoters make them well suited to serve as DNA methylationsensors.

Accordingly, in some embodiments, the promoter in a RGM construct of thepresent disclosure is an imprinted gene promoter. An example ofimprinting occurs in the so-called Prader-Willi Angelman (PWA) region onhuman chromosome 15 (in 15q11-13) or the orthologous region on mousechromosome 7, in which a DMR associated with the small nuclearribonucleoprotein polypeptide N (Snrpn) gene promoter region controlsits parent-of-origin monoallelic expression. In both humans and mice,the upstream region of the Snrpn gene comprises a region that is denselymethylated only on the maternal allele, which is silenced.

In some embodiments, the imprinted gene promoter in a RGM construct isderived from the Snrpn gene. As described in the Examples, a RGMconstruct comprising a minimal Snprn promoter operably linked to areporter gene can faithfully report on changes in DNA methylationassociated with a nearby DNA region of interest. For example, a RGMconstruct comprising a minimal Snprn promoter, when positioned inproximity to a CpG island, can be used to accurately report on gain andloss of DNA methylation of the CpG island. A RGM construct inserted intothe genome of a cell can be used to accurately detect DNA methylationchanges in non-coding regulatory regions such as enhancers andsuper-enhancers.

In some embodiments the sequence of the promoter in an RGM construct isfrom the Snrpn promoter region that drives transcription of abicistronic transcript that encodes Snrpn protein and a proteinidentified as the Snrpn upstream reading frame (Snurf). This promoterregion is also known as the Snurf-Snrpn promoter region. Where thepresent disclosure refers to the Snrpn promoter region, it should beunderstood that the term refers to the Snurf-Snrpn promoter region, andthe promoter of the Snrpn gene refers to the promoter that drivestranscription of the bicistronic transcript that encodes the Snrpn andSnurf proteins (Snurf-Snrpn transcript). Those of ordinary skill in theart will appreciate that transcription of certain other transcripts thatalso encode Snrpn but lack the complete open reading frame for Snurf isdriven by different promoter(s) located upstream. The bicistronictranscript corresponds to RefSeq accession number NM_013670.3 (mouse) orNM_003097.3 (human). In some embodiments the sequence of the promoter inan RGM construct comprises or consists of the following sequence fromthe Snrpn promoter region (where underlining indicates a portion of thesequence that is highly conserved between the mouse and human Snrpnpromoter regions):

(SEQ ID NO: 1) ACGCTCAAATTTCCGCAGTAGGAATGCTCAAGCATTCCTTTTGGTAGCTGCCTTTTGGCAGGACATTCCGGTCAGAGGGACAGAGACCCCTGCATTGCGGCAAAAATGTGCGCATGTGCAGCCATTGCCTGGGACGCATGCGTAGGGAGCCGCGCGACAAACCTGAGCCATTGCGGCAAGACTAGCGCAGAGAGGAGAGGGAGCCGGAGATGCCAGACGCTTGGTTCTGAGGAGTGATTTGCAACGCAATGGAGCGAGGAAGGTCAGCTGGGCTTGTGGATTCT.

In some embodiments the sequence of the promoter in an RGM constructcomprises or consists of a sequence at least 70%, 75%, 80%, 85%, 90%,95%, 96%, 97%, 98%, 99%, 100% identical to SEQ ID NO: 1 across a portionof SEQ ID NO: 1 that comprises at least 150, 175, 200, 210, 220, 230,240, 250, 260, 270, 280 or all 284 nucleotides of SEQ ID NO: 1. In someembodiments the portion of SEQ ID NO: 1 is highly conserved between themouse and human Snrpn promoter regions. For example, in some embodimentsthe promoter comprises or consists of a sequence at least 70%, 75%, 80%,85%, 90%, 95%, 96%, 97%, 98%, 99%, 100% identical to nucleotides 59-264of SEQ ID NO: 1, i.e., at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%,98%, 99%, 100% identical to the following sequence:

(SEQ ID NO: 2) CAGGACATTCCGGTCAGAGGGACAGAGACCCCTGCATTGCGGCAAAAATGTGCGCATGTGCAGCCATTGCCTGGGACGCATGCGTAGGGAGCCGCGCGACAAACCTGAGCCATTGCGGCAAGACTAGCGCAGAGAGGAGAGGGAGCCGGAGATGCCAGACGCTTGGTTCTGAGGAGTGATTTGCAACGCAAT GGAGCGAGGAAGGT.

In some embodiments the promoter in an RGM construct comprises orconsists of a sequence at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%,98%, 99%, or 100% identical to a portion of SEQ ID NO: 1 at least 150nucleotides long, starting at any position of SEQ ID NO: 1 betweenpositions 1 and position 100 (e.g., position 1, 10, 20, 30, 40, 50, 60,70, 80, 90, or 100) and extending up to any position of SEQ ID NO: 1 ator above position 200, e.g., at or above position 210, 220, 230, 240,250, 260, 270, 280, or 284. All combinations of starting and endingpositions are disclosed. For example, in some embodiments the promotercomprises or consists of a sequence at least 70%, 75%, 80%, 85%, 90%,95%, 96%, 97%, 98%, 99%, 100% identical to the sequence extending fromposition X to position Y of SEQ ID NO: 1, where X can be any integerbetween 1 and 100, and Y can be any integer between 200 and 284. In someembodiments the promoter in an RGM construct comprises or consists of asequence at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or100% identical to nucleotides 60-264, 65-264, 70-264, 75-264, 80-264,85-264, 90-265, 95-264, 100-264, 105-264, 110-264, 115-264, 120-264,125-264, 130-264, 135-264, or 140-264 of SEQ ID NO: 1.

The Snrpn promoter set forth in SEQ ID NO: 1 contains 16 CGdinucleotides. For purposes of description, the CG dinucleotides can benumbered consecutively from 1 to 16, starting with the CG at positions2-3 (CG #1) and ending with the CG at positions 255-256 (CG #16). Insome embodiments, a variant of SEQ ID NO: 1 comprises a sequence thatincludes at least 8, 9, 10, 11, 12, 13, 14, 15, or all 16 of the CGdinucleotides of SEQ ID NO: 1 (i.e., these CG dinucleotides are notmutated or absent from the sequence).

In some embodiments a Snrpn promoter used in an RGM construct maycomprise additional sequence from the Snrpn promoter region. FIG. 9shows sequences from the Snrpn promoter region, including the minimalSnrpn promoter as well as upstream sequences (SEQ ID NO: 3). In someembodiments an RGM promoter comprises an additional approximately 100,200, 300, 400, 500 nt, or more of the sequence located upstream of theminimal Snrpn promoter. Any of the RGM constructs described herein maycomprise a Snrpn promoter, e.g., a minimal Snrpn promoter, operablylinked to a reporter gene.

Although the Snprn promoter is exemplified in most detail herein, itshould be understood other imprinted gene promoters may be used incertain embodiments. In some embodiments the sequence of an RGMconstruct comprises at least a portion of the sequence extending fromnucleotide position −5000 to nucleotide position +5000 in the genome ofa mammal (e.g., a mouse, rat, or human), where +1 represents the TSS ofan imprinted gene, negative numbers represent nucleotide positionslocated 5′ to the TSS, and positive numbers (whether or not shown with aplus sign) represent nucleotide positions located 3′ to the TSS. In someembodiments the length of the sequence that is included in the RGMconstruct is between about 200 nt and about 500 nt, between about 500 ntand about 1000 nt, between about 1000 nt and about 2000 nt, betweenabout 2000 nt and about 3000 nt, between about 3000 and about 4000 nt,or between about 4000 nt and about 5000 nt. In some embodiments thesequence in an RGM construct comprises or consists of a sequence thatextends from about nucleotide position −5000, −4500, −4000, −3500,−3000, −2500, −2000, −1900, −1800, −1700, −1600, −1500, −1400, −1300,−1200, −1100, −1000, −900, −800, −700, −600, −500, −400, −300, −250,−200, −150, −100, or −50 with respect to the TSS (position +1) of animprinted gene, up to and including position +1 (the TSS) 10, 25, 50,100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750,800, 850, 900, 950, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700,1800, 1900, 2000, 2500, 3000, 3500, 4000, 4500, or 5000 of an imprintedgene, or a variant of such a sequence that is at least 70%, 75%, 80%,85%, 90%, 95%, 96%, 97%, 98%, 99%, or 99.5% identical to the sequence.All combinations of starting and ending positions are disclosed. In someembodiments the sequence comprises the 5′ untranslated region of theimprinted gene. In some embodiments the sequence comprises at least thefirst exon of the imprinted gene. In some embodiments the sequencecomprises one, two, or more CpG islands (CGIs). In some embodiments theimprinted gene promoter is associated with a parent-of-origin DMR. Insome embodiments the DMR is a germline-derived DMR. In some embodimentsthe DMR is a secondary DMR. In some embodiments the imprinted genepromoter is associated with a CGI. An imprinted gene promoter isconsidered to be associated with a DMR or CGI is the imprinted genepromoter comprises, overlaps with, or is located in proximity to (e.g.,within) a DMR or CGI, respectively. In some embodiments the sequence ofthe promoter in an RGM construct is derived from an imprinted genepromoter that is associated with a DMR, and the RGM construct comprisesan at least 150, 200, 250, 300, 400, 500, 600, 700, 800, 900, or 1000 ntportion of the sequence of the DMR, or a variant of such a sequence thatis at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 99.5%identical to such portion.

In some embodiments the sequence of the promoter in an RGM constructcomprises an at least 150, 200, 250, 300, 400, or 500 nt portion of animprinted gene promoter region wherein the sequence of the portion ishighly conserved between the human and mouse orthologs of the gene. Forexample, the sequence may be at least 95%, 96%, 97%, 98%, 99%, 99.5%, or100% identical in the human and mouse orthologs of the imprinted gene.In some embodiments the sequence of the promoter in an RGM constructconsists of an at least 150, 200, 250, 300, 400, or 500 nt portion of animprinted gene promoter region, wherein the sequence of the portion ishighly conserved between the human and mouse orthologs of the gene, and,in some embodiments, further comprising up to 25, 50, 100, 150, or 200nt of the sequence that is located upstream and/or up to 25, 50, 100,150, or 200 nt of the sequence that is located downstream from thehighly conserved portion. In some embodiments the imprinted gene is agene that is widely expressed in mammalian tissues, such as Snrpn, e.g.,the gene is expressed in the predominant cell types found in at least 10or more organs or tissues. In some embodiments the imprinted gene may beless widely expressed, e.g., its expression may be tissue or cell typespecific. In some embodiments an RGM construct comprising an imprintedgene promoter that is selectively expressed in one or more tissue orcell types may be integrated into the genome of a cell of such celltype. In some embodiments an RGM construct comprising an imprinted genepromoter that is selectively expressed in one or more tissues or celltypes may be used to detect or monitor methylation of a ROI in one ormore of those tissues or cell types. In some embodiments the imprintedgene is a gene that is imprinted in at least mice and humans. In someembodiments the imprinted gene is imprinted in a species-specificmanner, e.g., it is imprinted in mice and not in humans, or vice versa.In some embodiments the imprinted gene is imprinted in at least mice,rats, humans, cattle, sheep, or horses. In some embodiments an RGMconstruct comprising an imprinted gene promoter that is imprinted in aspecies-specific manner may be integrated into the genome of a cell of aspecies in which the gene is imprinted. In some embodiments an RGMconstruct comprising an imprinted gene promoter that is imprinted in aspecies-specific manner may be used to detect or monitor methylationstate of a ROI in cells of a species in which the gene is imprinted.

In some embodiments the mammalian imprinted gene promoter is from theIgf2r, Gnas, Igf2, Meg3 (Gtl2), Airn, Kenq1ot1, Mest, Grb10, and Peg10genes (see Table 1 for Gene IDs of the human and mouse orthologs ofthese genes). In some embodiments the imprinted gene promoter isassociated with a parent-of-origin DMR. For example, in some embodimentsthe imprinted gene promoter is from the Igf2r, Gnas, or Meg3 gene. Insome embodiments the imprinted gene promoter comprises or overlaps a CpGisland.

TABLE 1 Selected Mammalian Imprinted Genes Gene Name Gene ID (mouse)Gene ID (human) IgfZr 16004 3482 Gnas 14683 2778 Meg3 17263 55384 Igf216002 3481 Airn 104103 100271873 Kcnq1ot1 63830 10984 Mest 17294 4232Grb10 14783 2887 Peg10 170676 23089

In some embodiments the imprinted gene is the Igf2r gene. The Igf2rpromoter is associated with a DMR, which includes the Igf2r TSS. The DMRassociated with the murine Igf2r promoter is depicted in FIG. 10 (SEQ IDNO: 4). In some embodiments the promoter in an RGM construct comprisesor consists of a minimal Igf2r promoter. In some embodiments the RGMconstruct comprises a sequence extending from about position −350, −300,−250, −200, or −150 to about position +1, +100, +200, +300, +400, +500,or +600, where +1 is the TSS of the Igfr2 gene. All combinations ofstarting and ending positions are disclosed. In some embodiments the RGMpromoter comprises the CpG island in the Igfr2 promoter region, or avariant thereof that is at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%,98%, 99%, or 99.5% identical thereto.

In some embodiments the imprinted gene is the Gnas gene. The Gnaspromoter is associated with a DMR, which includes the Gnas TSS. The DMRassociated with the murine Gnas promoter is depicted in FIG. 11 (SEQ IDNO: 5). In some embodiments the promoter in an RGM construct is aminimal Gnas promoter. In some embodiments the RGM construct comprises asequence extending from about position −600, −550, −500, −450, −400,−350, −300, −250, −200, or −150 to about position +1, +100, +200, +300,+400, +500, or +600, where +1 is the TSS of GNAS. In some embodimentsthe RGM construct comprises a sequence extending from about position−2820, 2500, −2000, −1500, or −1000 to about position −10, −1, +1, +100,+200, +300, +400, +500, or +600, where +1 is the TSS of Gnas. In someembodiments the sequence further comprises a sequence extending fromabout position +600 to about position +1000, +1500, +2000, +2500, or+3000, where +1 is the TSS of GNAS. All combinations of starting andending positions are disclosed.

In some embodiments the imprinted gene is the Meg3 (Gtl2) gene. The Meg3promoter is associated with a DMR, which includes the Meg3 TSS. The DMRassociated with the murine Meg3 promoter is depicted in FIG. 12 (SEQ IDNO: 6). In some embodiments the promoter in an RGM construct comprisesor consists of a minimal Meg3 promoter. In some embodiments the RGMconstruct comprises a sequence extending from about position −350, −300,−250, −200, or −150 to about position +1, +100, +200, +300, +400, +500,or +600, where +1 is the TSS of Meg3. All combinations of starting andending positions are disclosed.

In some aspects, disclosed herein is a nucleic acid comprising an RGMconstruct and one or more additional DNA sequences. For example, in someembodiments, a nucleic acid comprising an RGM construct furthercomprises a second reporter construct. The second reporter constructtypically encodes a reporter molecule that is distinguishable from thereporter molecule encoded by the RGM construct. In some embodiments, thesecond reporter construct may be used to identify or select for cellsthat have taken up the nucleic acid and that have the second reporterconstruct and the RGM construct stably integrated into their genome. Insome embodiments the reporter gene in the second reporter construct is aselectable marker gene. The second reporter construct may be positionedeither 5′ or 3′ with respect to the RGM construct in the nucleic acid.Typically, the promoter of the second reporter construct is one that isnot subject to regulation by DNA methylation and is not affected bymethylation of exogenous DNA (i.e., DNA outside the promoter in thesecond reporter construct). In some embodiments the promoter is aconstitutive promoter. For example, in some embodiments thephosphoglycerate kinase (PGK) promoter, cytomegalovirus enhancer/chickenβ-actin hybrid promoter (CAG promoter), cytomegalovirus (CMV) promoter,ubiquitin promoter, beta-actin promoter or elongation factor-1 alphapromoter is used. In some embodiments the nucleic acid may comprise anadditional reporter construct that comprises a cell type or cell statespecific promoter operably linked to a reporter gene. Expression of thereporter gene indicates that the cell type or cell state specificpromoter is active and may be used to identify a cell as being of aparticular cell type or as being in a particular cell state. Thus, insome embodiments the nucleic acid may comprise a plurality of elementsarranged as follows: RGM-SMC, RGM-SRC, RGM-SRC-SMC or RGM-SMC-SRC, whereSMC represents a selectable marker cassette and SRC represents a second(or third) reporter construct. In some embodiments, a nucleic acidcomprising an RGM construct does not comprise a selectable marker gene.In some embodiments, a nucleic acid comprising an RGM construct does notcomprise a selectable marker cassette.

In some embodiments, a nucleic acid comprising an RGM construct servesas a donor nucleic acid for homologous recombination in order tointroduce at least the RGM construct into the genome of a cell. To thatend, in some embodiments, a nucleic acid comprising an RGM constructfurther comprises one or more nucleic acid sequences that are homologousto sequences in the mammalian genome that are located on one or bothsides of a selected location in the genome that comprises a site atwhich the RGM construct is to be integrated. In some embodiments,homology arms may be positioned on each side of a segment of the nucleicacid that is to be integrated into the genome of a cell (see FIG. 3 foran example). For example, the nucleic acid may comprise a plurality ofelements arranged as follows: HA1-RGM-[SMC]-[SRC]-HA2, where HA1 and HA2represent first and second homology arms, SMC and SRC represent aselectable marker cassette and a second reporter cassette, respectively,and the brackets are used to indicate that the element within thebrackets may or may not be present in various embodiments. Thehomologous sequences facilitate integration of the segment into the cellin a region of the genome comprising sequences that are homologous tothe homology arms. Thus HA and HA2 may be homologous to adjacentsequences in a region of interest in the genome. In some embodiments,one of the homology arms is homologous to a region of the genome that is5′ to a target location in the genome, and the other homology arm ishomologous to a region of the genome that is 3′ to a target location inthe genome. In some embodiments, a targetable nuclease that isprogrammed to cleave DNA at or within the target location is used to cutthe genomic DNA. Repair by homologous recombination (homology-directedrepair) using the nucleic acid as a donor results in incorporation of atleast the region located between the homology arms into the genome. Thusin some aspects, a nucleic acid comprising an RGM construct serves as adonor nucleic acid for homologous recombination to integrate the RGMconstruct into the genome in proximity to a region of interest.

In some embodiments the one or more additional nucleic acid sequencescomprises a DNA region of interest. In some embodiments, the DNA regionof interest comprises a sequence that is normally hypermethylated whenpresent in the genome of at least some mammalian cell types in itsnatural location. In some embodiments, the DNA region of interestcomprises a sequence that is normally hypomethylated when present in thegenome of at least some mammalian cell types in its natural location. Insome embodiments, the DNA region of interest comprises at least aportion of a mammalian CpG island (CGI). In some embodiments the CGI isone that, when present in the genome of a mammalian cell in its naturallocation, is associated with a promoter that is normally widelyexpressed in a constitutive manner in vivo. In some embodiments the CGIis one that, when present in the genome of a cell in its naturallocation in vivo, is normally hypomethylated in its native state invivo. For example, in some embodiments the CGI is associated with theGapdh promoter. In mammalian cells the Gapdh promoter typicallycomprises a hypomethylated CGI, which is consistent with itsconstitutive expression in all tissues. In some embodiments the CGI isone that, when present in the genome of a cell in its natural locationin vivo, is associated with a promoter that is normally expressed in acell type specific manner. In some embodiments the CGI is one that isassociated with a promoter that is normally expressed exclusively ingerm cells. In some embodiments the CGI is one that is normallyhypermethylated when present in the genome of a cell in its naturallocation in vivo. For example, in some embodiments the CpG island isassociated with the Dazl promoter, which is expressed specifically ingerm cells and is normally hypermethylated in all tissues except germcells.

In some embodiments a nucleic acid may be contacted with one or more DNAmodifying enzymes before being introduced into a cell. In someembodiments the DNA modifying enzyme comprises a methyltransferase. Insome embodiments the DNA modifying enzyme comprises a CpGmethyltransferase. In some embodiments the DNA modifying enzymecomprises a bacterial DNA methyltransferase. In some embodiments the DNAmodifying enzyme comprises a eukaryotic DNA methyltransferase, e.g., amammalian DNA methyltransferase. In some embodiments the nucleic acidconstruct is contacted with a CpG methyltransferase under appropriateconditions and for a sufficient time so that at least 50%, 60%, 70%,75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, 100% of the CpGs in the RGMpromoter are methylated. The nucleic acid may further comprise a ROI.The nucleic acid may be integrated into the genome of a cell and themethylation state of the ROI may subsequently be determined by measuringexpression of the reporter molecule and/or by standard methodology formethylation analysis.

In some aspects, described herein is a nucleic acid comprising amammalian imprinted gene promoter and a restriction site located 3′ withrespect to the promoter. The restriction site is appropriatelypositioned to allow the insertion of a reporter gene of choice in orderto create an RGM construct. The ordinary skilled artisan may select areporter gene of choice from, e.g., those described herein.

A nucleic acid comprising an RGM construct and a DNA region of interestcan be introduced into a cell and integrated into the genome of the cellat a random location or at a predetermined location. The RGM constructmay be used to report on methylation of the DNA region of interest. Ifdesired, the region of the genome or the location at which the nucleicacid is integrated may be identified, e.g., by sequencing. In someembodiments, the nucleic acid is subjected to methylation in vitrobefore the nucleic acid is introduced into a cell. For example, thenucleic acid may be contacted with a DNA methylating enzyme in vitro.

In some embodiments, a nucleic acid comprising an RGM construct (and,optionally, one or more additional DNA sequences such as an additionalreporter construct, homology arms, and/or a DNA region of interest) isincorporated into a vector that can be used to transfer the nucleic acidinto a cell. Any of a wide variety of vectors may be used in variousembodiments. Those of ordinary skill in the art are aware of suitablevectors for introducing nucleic acids into cells of interest, e.g.,mammalian cells. For example, DNA or RNA plasmids, viral vectors (e.g.,based on adenoviruses, adeno-associated viruses, retroviruses,lentiviruses, vaccinia virus and other poxviruses, herpesviruses) ortransposons may be used.

In general, any method known in the art for introducing nucleic acidconstructs or vectors into cells may be used to introduce a nucleic acidor vector comprising an RGM construct into cells. One of ordinary skillin the art will select a suitable method depending on, e.g., theparticular vector, cell type, or experimental conditions (e.g., in vitroor in vivo). In some embodiments, transfection, viral infection,electroporation, or microinjection may be used. Those of ordinary skillin the art are aware of suitable transfection reagents. In someembodiments an RGM construct or vector comprising an RGM construct maybe injected into a living nonhuman animal, which may be an embryo,fetus, postnatal, juvenile, or adult animal. In some embodiments theanimal may subsequently be subjected to imaging. In some embodimentscells that have an RGM construct integrated into their genome areintroduced into a nonhuman animal. If the cells are not immunologicallycompatible (e.g., are of a different species or noncompatible strain),the animal may be immunocompromised if appropriate to reduce thelikelihood that the cells would be rejected. In some embodiments theintroduced cells may contribute to one or more organs or tissues of thenon-human mammal, e.g., the nervous system.

In general, a reporter molecule may be measured at any time afterintroduction of the RGM construct into a cell or subject. In someembodiments the reporter molecule may be first measured between about 12hours and about 7 days, between 1 and 2 weeks, between 2 and 6 weeks,between 8 and 12 weeks, or more after introducing the RGM construct intoa cell or subject. In some embodiments, a stable cell line comprising anucleic acid comprising an RGM construct integrated into its genome isderived.

In some embodiments, a control reporter construct may be introduced intocells in addition to an RGM reporter construct. In some embodiments, acontrol reporter construct comprises a constitutive promoter operablylinked sequence encoding a reporter molecule that is distinguishablefrom the reporter molecule encoded by the RGM construct, operably linkedto a constitutive promoter whose activity is not affected either bymethylation of the promoter itself or by methylation of sequencesexogenous to the promoter. In some embodiments the control reporterconstruct may be used to normalize the signal from the reporter moleculeencoded by the RGM construct.

In some embodiments a nucleic acid sequence encoding a reporter molecule(or other gene product to be expressed in a cell) comprises atranscription terminator, which term refers to a section of nucleic acidsequence that mediates transcriptional termination by providing signalsin the newly synthesized RNA that trigger processes which release theRNA from the transcriptional complex. In the case of a eukaryotic mRNAtranscribed by RNA polymerase II, a transcription terminator maycomprise a sequence that is transcribed to produce a sequence thattriggers cleavage of and addition of a polyA tail to the newlysynthesized mRNA. For example, the nucleic acid may comprise a pluralityof elements arranged as follows: RGMP-RG-polyA, where RGMP represents anRGM Promoter, RG, represents a reporter gene, and polyA represents atranscription terminator that when present in mRNA triggers cleavage andaddition of polyA. Those of ordinary skill in the art are aware ofsuitable transcription terminators for use in cells of interest, e.g.,mammalian cells. For example, the simian virus 40 (sv40) latepolyadenylation signal (SVLPA) or the human or bovine growth hormonepolyadenylation signal may be used.

In some embodiments a nucleic acid comprising a DNA methylation reporteris integrated into the genome of a cell in proximity to a DNA region ofinterest (ROI). In general, the ROI may be anywhere in the genome. Insome embodiments the ROI is in a non-transcribed region of the genome.In some embodiments the ROI is no more than 1 kilobase (kb), 2 kb, 3 kb,4 kb, 5 kb, 6 kb, 7 kb, 8 kb, 9 kb, 10 kb, 20 kb, or 50 kb away from astart site for transcription of an RNA (a transcription start site(TSS)). In some embodiments the RNA transcript is included in the NCBIRNA reference sequence collection (RefSeq), which is available on theworldwide web at subdomain ncbi.nlm.nih.gov/refseq (Pruitt K D, TatusovaT, Maglott D R. NCBI Reference Sequence (RefSeq): a curatednon-redundant sequence database of genomes, transcripts and proteins.Nucleic Acids Res. 2005 Jan. 1; 33 (Database issue):D501-4; Pruitt, K D,et al., Nucleic Acids Res. 2012 January; 40 (Database issue): D130-5.doi: 10.1093/nar/gkr1079; Pruitt, K D, et al., Nucleic Acids Res. 2014January; 42 (Database issue):D756-63. doi: 10.1093/nar/gkt1114). RefSeqprovides genomic, transcript, and coding sequences as well as geneannotations that include, among other things, TSSs for mammalian genes.Wherever relevant, a RefSeq sequence may be used for any genomicsequence, transcript, or protein sequence of interest herein.

In some embodiments the region of interest is a regulatory region of agene of interest. In some embodiments the RGM construct is integrated ata location a distance of no more than 1 kb, 2 kb, 3 kb, 4 kb, 5 kb, 10kb, 15 kb, 20 kb, or 50 kb from a regulatory region of a gene. In someembodiments the location is a distance of no more than 1 kilobases (kb),2 kb, 3 kb, 4 kb, 5 kb, 10 kb, 20 kb, or 50 kb from the 5′ end of anopen reading frame. In some embodiments the location is a distance of nomore than 1 kilobases (kb), 2 kb, 3 kb, 4 kb, 5 kb, 10 kb, 20 kb, or 50kb from a CpG island. For purposes of the present disclosure, the“distance” between two locations in terms of nucleotides (i.e., thenumber of intervening nucleotides between the two locations) iscalculated as follows: If one location is a single nucleotide and theother location is a region two or more nucleotides long, the number ofintervening nucleotides is the number of nucleotides between the singlenucleotide and the closer of the two terminal nucleotides of the otherregion. If both locations are regions two or more nucleotides long, thenumber of intervening nucleotides is the number of nucleotides betweenthe closest terminal nucleotides of the two regions, i.e., the number ofnucleotides that would need to be removed to make the two regionscontiguous. A regulatory region may be any region that affects the levelof transcription from the gene. Examples of regulatory regions includesuperenhancers, enhancers, and promoters. In some embodiments the ROIcomprises a superenhancer, enhancer, or promoter. In some embodiments anRGM construct is integrated into a superenhancer, enhancer, or promoter.In some embodiments the ROI is a distal regulatory region, which termrefers to a regulatory region outside the promoter region of a gene. Insome embodiments the ROI is not an imprinting control region. In someembodiments the ROI is an imprinting control region. In some embodimentsthe ICR is IG-DMR or H19-DMD.

In general, a gene of interest may be any gene. In some embodiments thegene of interest encodes a protein. In some embodiments the gene ofinterest encodes a transcription factor, a transcriptional co-activatoror co-repressor, an enzyme, a receptor, a secreted protein, atransmembrane protein, a histone, a peripheral membrane protein, asoluble protein, a nuclear protein, a mitochondrial protein, a lysosomalprotein, a growth factor, a cytokine, an interferon, a chemokine, ahormone, an extracellular matrix protein, a motor protein, a celladhesion molecule, a major or minor histocompatibility (MHC) protein, atransporter, a channel an immunoglobulin (Ig) superfamily (IgSF) gene, atumor necrosis factor, an NF-kappaB protein, an integrin, a cadherinsuperfamily member, a selectin, a clotting factor, a complement factor,a plasminogen, plasminogen activating factor, a proto-oncogene, anoncogene, a tumor suppressor gene, a chaperone, a heat shock factor, aheat shock protein. In some embodiments the gene encodes a DNA modifyingenzyme or a histone modifying enzyme. In some embodiments the geneencodes a kinase, a phosphatase, a GTPase, or an ATPase. In someembodiments the gene encodes a long non-coding RNA, which term refers toan RNA at least 200 nt long that is not a microRNA precursor. In someembodiments the gene encodes a microRNA precursor. In some embodimentsthe gene is an imprinted gene. In most embodiments the gene is not animprinted gene.

It will be appreciated that, in general, an RGM construct comprising apromoter derived from a mammalian imprinted gene is used as a reporterfor methylation of a region of the genome that is not the promoterregion or gene body of the imprinted gene from which the promoter insuch construct is derived. Thus, in general, the region of interest isnot the promoter region or gene body of an imprinted gene from which thepromoter in the RGM construct was derived. For example, if the RGMconstruct comprises a Snrpn promoter, the construct is typically notintegrated into the Snrpn promoter or gene body nor used to report onmethylation state of the endogenous Snrpn promoter region or gene body.In some embodiments an RGM construct comprising an imprinted genepromoter from a mammalian imprinted gene is integrated into the genomeon a different chromosome or different chromosome arm from that whichnaturally contains the imprinted gene. In some embodiments an RGMconstruct comprising an imprinted gene promoter from a mammalianimprinted gene is integrated into the genome on the same chromosome orchromosome arm as that which naturally contains the imprinted gene, butis integrated at least 20, 40, 60, 80, 100, 150, 200, 300, 400, or 500kb away from the imprinted gene promoter or gene body of the imprintedgene. In general, an RGM construct comprising a promoter derived from amammalian imprinted gene is used as a reporter for methylation of aregion of the genome that is not the ICR that controls imprinting of theimprinted gene from which the promoter in such construct is derived.Thus, in general, the region of interest is not the ICR of an imprintedgene from which the promoter in the RGM construct was derived. Forexample, if the RGM construct comprises a Snrpn promoter, the constructis typically not integrated into the ICR that controls imprinting of theSnrpn gene nor used to report on methylation state of such ICR.

In some embodiments the ROI is a repetitive element such as a tandemrepeat (e.g., satellite DNA), interspersed repeats such as LINEs (e.g.,Alu sequences), or SINEs. In some embodiments the ROI is within up toabout 10 kb, 20 kb, or 50 kb from a telomere or centromere. In someembodiments the ROI comprises a tissue-specific DMR,reprogramming-specific DMR, or disease-specific DMR. In some embodimentsthe ROI comprises a secondary DMR or germline-derived DMR. In someembodiments the ROI comprises an imprinting control region.

In some embodiments a ROI is any DNA region that, based on standardmethylation analysis, has been found to normally be hypomethylated inthe genome of cells of one or more cell types or cell states, e.g.,cells of the same cell type or cell state as that of a cell into which anucleic acid comprising an RGM construct is introduced. In someembodiments an RGM construct integrated into such a region may be usedto detect that the ROI is aberrantly hypermethylated and/or to detect anincrease in methylation of the region, e.g., to a hypermethylated state.In some embodiments an RGM construct integrated into such a region maybe used to detect that the ROI has a normal methylation state and/orremains stably hypomethylated.

In some embodiments a ROI is any DNA region that, based on standardmethylation analysis, has been found to normally be hypermethylated inthe genome of cells of one or more cell types or cell states, e.g.,cells of the same cell type or cell state as that of a cell into which anucleic acid comprising an RGM construct is introduced. In someembodiments the RGM construct integrated into such a region may be usedto detect that the ROI is aberrantly hypomethylated and/or to detect adecrease in methylation of the region, e.g., to a hypomethylated state.In some embodiments an RGM construct integrated into such a region maybe used to detect that the ROI has a normal methylation state and/orremains stably hypermethylated.

In some embodiments the ROI is a superenhancer or enhancer that isactive (i.e., able to enhance transcription of one or more genes) in EScells and/or iPS cells but is not active in somatic cells. In someembodiments the ROI is a superenhancer or enhancer that is active insomatic cells of one or more cell types but is not active in ES cellsand/or iPS cells. In some embodiments the ROI is a superenhancer orenhancer that is active in an adult stem cell, e.g., a hematopoieticstem cell, neural stem cell, intestinal stem cell, mammary stem cell,mesenchymal stem cell, olfactory stem cell, or neural crest stem cell,but is not active in at least one other type of adult stem cell. In someembodiments the ROI is a superenhancer or enhancer that is active in anadult stem cell, e.g., an adult stem cell of any of the foregoing types,but is not active in at least one type of more differentiated cell towhich the adult stem cell can give rise. In some embodiments the ROI isa superenhancer or enhancer that is active in a first differentiatedcell type but is not active in at least one, several, most, oressentially all other differentiated cell types. In some embodiments theROI is a CCI or CGI shore. In some embodiments the ROI is a low CpGregion. In some embodiments the low CpG region is outside of CpG shores.In some embodiments the low CpG region is a region at least 200, 500,1000, or 2000 nt long that has no more than half the density of CpGs asdoes a CGI. In some embodiments the ROI is a region that isdifferentially bound by one or more DNA binding proteins (e.g.,transcription factor, CTCF) in cells of at least two different celltypes or cell states. In some embodiments the ROI is a disease-specificDMR. In some embodiments the ROI is a tissue-specific DMR.

In some embodiments the ROI is a promoter that is active (i.e., able todrive transcription of one or more genes) in ES cells and/or iPS cellsbut is not active in somatic cells. In some embodiments the ROI is apromoter that is active in somatic cells of one or more cell types butis not active in ES cells and/or iPS cells. In some embodiments the ROIis a promoter that is active in an adult stem cell, e.g., ahematopoietic stem cell, neural stem cell, intestinal stem cell, mammarystem cell, mesenchymal stem cell, olfactory stem cell, or neural creststem cell, but is not active in at least one other type of adult stemcell. In some embodiments the ROI is a promoter that is active in anadult stem cell, e.g., an adult stem cell of any of the foregoing types,but is not active in at least one type of more differentiated cell towhich the adult stem cell can give rise. In some embodiments the ROI isa promoter that is active in a first differentiated cell type but is notactive in at least one, several, most, or essentially all otherdifferentiated cell types.

In some embodiments the region of interest is in an autosome, and thegenome of the mammalian cell comprises two copies (alleles) of theregion of interest—one on each of two homologous autosomes. The ROI maybe on any autosome or may be on the X or Y chromosome in variousembodiments. In some embodiments the cell comprises a nucleic acidcomprising an RGM construct integrated into its genome in proximity toonly one of the two alleles of the region of interest. In someembodiments the nucleic acid comprising an RGM construct is integratedinto the paternal allele of the ROI. In some embodiments the nucleicacid comprising an RGM construct is integrated into the maternal alleleof the ROI. In some embodiments the genome of the cell comprises twonucleic acids each comprising an RGM construct, one nucleic acidintegrated into each allele of the region of interest. When only oneallele of a gene or region of DNA is genetically modified, this may bereferred to as “monoallelic modification”. When both alleles of a geneor region of DNA are genetically modified, this may be referred to as“biallelic modification”. Any of the genetic modifications describedherein may be monoallelic or biallelic in various embodiments. Thereporter genes in the RGM constructs in the case of biallelicmodification may encode the same reporter molecule or different reportermolecules. In some embodiments cells having a biallelic modificationwith RGM constructs that encode distinguishable reporter molecules maybe used to compare the timing of methylation or demethylation of the twoalleles of an ROI, e.g., as a cell undergoes a cell identity or cellstate transition.

One of ordinary skill in the art can locate transcription start sites,gene bodies, exons, introns, histone modifications (methylation,acetylation), CGIs, CGI shores, promoters, enhancers, superenhancersand/or sites of DNA methylation or DMRs that have been identified usingstandard methods for methylation analysis in the genome of a species ofinterest using publicly available databases and resources such as theUCSC Genome Browser (available on the worldwide web at subdomaingenome.ucsc.edu/; see, e.g., Kent, W., et al., The human genome browserat UCSC. Genome Research 2002; 12:996-1006 and/or Rosenbloom K R, et al,The UCSC Genome Browser database: 2015 update Nucleic Acids Res. 2015;43 (Database issue): D670-81). For example human assemblies GRCh37/hg19or GRCh38/hg38 or the mouse (Mus musculus) assemblies GRC37/mm9 orGRC38/mm10, or subsequent genome assemblies, may be used. One ofordinary skill in the art can design homology arms, guide RNAs, TALENsto direct integration of a nucleic acid comprising an RGM construct inproximity to a region of interest.

In some aspects described herein is a collection of mammalian cells orcell lines (a library of cells or cell lines), each comprising an RGMconstruct integrated at a different location in the genome of the cell.The locations may be at least 10 kb apart on average. In someembodiments the library comprises at least 500, 1000, 5000, 10000,20000, 50000, 100000 or more cells or cell lines. The locations may berandom or may be selected. In some embodiments the library comprisesmembers in which the RGM construct is integrated within about 10 kb orabout 5 kb of a TSS for each of at least 10000, 20000, or more RefSeqgenes. The cells could be of any cell type in various embodiments. Insome embodiments they are ES or iPS cells or fibroblasts. In someembodiments such a library could be used to develop a genome-wideprofile of methylation state changes during cell state changes such asdifferentiation or reprogramming.

In some aspects described herein is a collection (library) of nucleicacids each comprising an RGM construct comprising homology armshomologous to sequences flanking different locations in the genome of amammalian cell. In some embodiments the library of nucleic acidscomprises at least 500, 1000, 5000, 10000, 20000, 50000, 100000 or morenucleic acids comprising different homology arms homologous to sequencesflanking different locations in the genome of a mammalian cell.

In some embodiments an RGM construct may be used to detect a differencein the level of methylation of a ROI between two cells or populations ofcells. In some embodiments a difference is about 5%, 10%, 15%, 20%, 25%,30%, 40%, 50%, 60%, 70%, 80%, 90% in the level of methylation of theROI. In some embodiments the difference is at least 20%, or at least50%.

In some embodiments two RGM constructs may be used to detect adifference in the level of methylation of two different ROIs in the samecell or population of cells or in different cells or populations ofcells. In some embodiments a difference is about 5%, 10%, 15%, 20%, 25%,30%, 40%, 50%, 60%, 70%, 80%, 90% in the level of methylation of the twoROIs. In some embodiments an RGM construct may be used to detect achange in the level of methylation of a ROI. The change may be anincrease or decrease. In some embodiments a change is an increase byabout 5%, 10%, 15%, 20%, 25%, 30%, 40%, 50%, 60%, 70%, 80%, 90% from alevel of about 5%, 10%, 15%, 20%, 25%, 30%, 40%, 50%, 60%, 70%, 80%, 90%methylation (up to a maximum of 100% methylation). In some embodiments achange is a decrease by about 5%, 10%, 15%, 20%, 25%, 30%, 40%, 50%,60%, 70%, 80%, 90% from a level of about 100%, 90%, 80%, 70%, 60%, 50%,40%, 30%, 20%, or 10% (down to a minimum of 0% methylation). In someembodiments the magnitude of the change is at least 20%, at least 30%,at least 40%, at least 50%, at least 60%, at least 70%, or at least 80%.In some embodiments a change is an increase from a level of about 5% orless methylation to a level of about 10%, 20%, 30%, 40%, 50%, 60%, 70%,80%, 90%, or more, an increase from a level of about 5%-10% methylationto a level of about 15%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, ormore, an increase from a level of about 10%-20% methylation to a levelof about 25%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, or more, an increasefrom a level of about 20%-30% methylation to a level of about 35%, 40%,50%, 60%, 70%, 80%, 90%, or more, an increase from a level of about30%-40% methylation to a level of about 45%, 50%, 60%, 70%, 80%, 90%, ormore, an increase from a level of about 40%-50% methylation to a levelof about 55%, 60%, 70%, 80%, 90%, or more, an increase from a level ofabout 50%-60% methylation to a level of about 65%, 70%, 80%, 90%, ormore, an increase from a level of about 60%-70% methylation to a levelof about 75%, 80%, 90%, or more, or an increase from a level of about70%-80% methylation to a level of about 85%, 90% or more. In someembodiments a change is a decrease from a level of about 90% or moremethylation to a level of no more than 5%, 10%, 20%, 30%, 40%, 50%, 60%,70%, 80%, or 85%, a decrease from a level of about 80%-90% methylationto a level of no more than 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, or75%, a decrease from a level of about 70%-80% methylation to a level ofno more than 5%, 10%, 20%, 30%, 40%, 50%, 60%, or 65%, a decrease from alevel of about 60%-70% methylation to a level of no more than 5%, 10%,20%, 30%, 40%, 50%, or 55%, a decrease from a level of about 50%-60%methylation to a level of no more than 5%, 10%, 20%, 30%, 40%, or 45%, adecrease from a level of about 40%-50% methylation to a level of no morethan about 5%, 10%, 20%, or 30%, or 35% methylation, a decrease from alevel of about 30%-40% methylation to a level of no more than 5%, 10%,20%, 30%, or 35% methylation, a decrease from a level of about 20%-30%methylation to a level of no more than 5%, 10%, or 15%, or a decreasefrom about 10% to 20% methylation to a level of no more than 5%methylation.

In some embodiments an RGM construct may be used to determine thepercentage or number of cells in a population of cells that exhibit aselected level or range of levels of methylation of a ROI, e.g., thepercentage or number of cells in which the ROI is hypermethylated, orthe percentage or number of cells in which the ROI is hypomethylated.The selected level may be about 0% 5%, 10%, 15%, 20%, 25%, 30%, 40%,50%, 60%, 70%, 80%, 90%, 95%, or more. In some embodiments an RGMconstruct may be used to determine the percentage or number of cells ina population of cells that exhibit a selected change in levelmethylation of a ROI, e.g., the percentage or number of cells in whichthe ROI changes from hypermethylated to hypomethylated, or vice versa,over a given period of time. The change may be about 10%, 15%, 20%, 25%,30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, or more in various embodiments.In some embodiments the level of the reporter molecule and thus thelevel of methylation of the ROI may be measured using flow cytometry,e.g., FACS. In some embodiments flow cytometry, e.g., FACS, may be usedto separate a population of cells into 2, 3, 4, 5, or moresubpopulations based on the level of the reporter molecule. Thepopulations may be further analyzed or compared using any conventionalmethod for analyzing cells, such as gene expression profiling (e.g.,using microarrays or RNA-Seq), analysis of chromatin marks (e.g., usingChip-Seq or Chip-ChIP), protein expression profiling, morphologicalanalysis, etc. In some embodiments, cells that are isolated based onexpression level of the reporter molecule (e.g., low or absentexpression, or robust expression) are further maintained (e.g., inculture) for a period of time and analyzed again for the reportermolecule.

Reporter Molecules

A wide variety of reporter molecules may be used in the reporterconstructs described herein. In some embodiments, the reporter moleculein an RGM construct or other reporter construct described herein iscompatible with detection in individual, living mammalian cells. In someembodiments the reporter molecule is substantially non-toxic tomammalian cells when expressed at levels appropriate for its detectionin a method described herein. In some embodiments, detection of thereporter molecule does not require cell lysis or permeabilization. Insome embodiments the reporter molecule does not generate a detectablepermanent change in the cell. In such embodiments, once the reportermolecule has been degraded or otherwise removed, the fact that thereporter molecule had been produced in the cell is no longer evident.Such reporter molecules are considered “reversible” and, in someembodiments, may be used to report on multiple cycles of methylation anddemethylation of the region of interest. In some embodiments thereporter molecule creates a permanent, heritable change in the genome ofthe cell. In such embodiments, the fact that the reporter molecule hadbeen produced in the cell remains evident even after the reportermolecule has been degraded or otherwise removed. Such reporter moleculesare considered “irreversible”. In some embodiments, an irreversiblereporter molecule may be useful for lineage tracing or other settings inwhich it is desired to be able to permanently mark a cell and/or itsprogeny (descendants) based on methylation state of the region ofinterest.

In some embodiments, detection of the reporter molecule comprisesdetection of light emitted by the reporter molecule or by a chemicalreaction catalyzed by the reporter molecule. For example, in someembodiments, the reporter molecule in an RGM construct or other reporterconstruct described herein comprises a fluorescent or bioluminescentprotein or a luciferase. Such proteins are well known in the art andinclude both naturally occurring proteins and engineered variantsdesigned to have one or more altered properties relative to thenaturally occurring protein, such as increased photostability, increasedpH stability, increased fluorescence or light output, reduced tendencyto dimerize, oligomerize, aggregate or be toxic to cells, an alteredabsorption/emission spectrum (in the case of a fluorescent protein),altered emission spectrum (in the case of a luciferase or luminescentprotein), and/or altered substrate utilization (in the case aluciferase).

Fluorescent proteins include, e.g., green fluorescent protein (GFP) fromthe jellyfish Aequorea victoria, and related proteins comprisingchromophores that emit green light or light of different colors such asred, yellow, blue, and cyan. Many of these proteins are found in marineanimals such as Hydrozoa and Anthozoa species, crustaceans, and combjellies. Examples of fluorescent proteins that may be used include,e.g., GFP, EGFP, Sirius, Azurite, EBFP2, BFP, mTurquoise, ECFP,Cerulean, mTFP1, mUkG1, mAG1, AcGFP, mWasabi, EmGFP, YPF, EYFP, Topaz,SYFP2, Venus, Citrine, mKO, mKO2, mOrange, mOrange2, LSSmOrange,PSmOrange, and PSmOrange2, mStrawberry, mRuby, mCherry, mRaspberry,tdTomato, mKate, mKate2, mPlum, mNeptune, T-Sapphire, mAmetrine, mKeima,E2-Orange, E2-Red/Green, and E2-Crimson, ZsGreen. See, e.g., See, e.g.,Chalfie, M. and Kain, S R (eds.) Green fluorescent protein: properties,applications, and protocols (Methods of biochemical analysis, v. 47)Wiley-Interscience, Hoboken, N.J., 2006; Chudakov, D M, et al., PhysiolRev. 90(3):1103-63, 2010, US Pat. Pub. Nos. 20030170911, 20060194282,20070099175, 20090203035, 20100227400; 20100184954; 20110020784;20140237632 for further description of various reporter molecules thatmay be used.

As used herein, a “far red fluorescent protein” is a FP that has anemission maximum between 625 nm and 680 nm. Examples include mPlum,mNeptune, and E2-Crimson. In some embodiments a far red FP is aderivative of DsRed. As used herein, an “infrared fluorescent protein”is a FP that has an emission maximum above 680 nanometers (nm), e.g.,between 680 nm and 900 nm. In some embodiments, an infrared fluorescentprotein has an emission maximum above 700 nanometers (nm), e.g., between700 nm and 750 nm, between 750 nm and 800 nm, or between 800 nm and 900nm. Without wishing to be bound by any theory, a far red or infraredprotein may prove particularly advantageous for performing imaging inintact animals (e.g., intact mice) or tissue slices due, for example, tothe ability of far red and infrared light to penetrate through tissuemore efficiently than light of lower wavelengths. In some embodiments, areporter molecule for in vivo imaging has emission near or above 650 nmas signals. In some embodiments an infrared protein is a variant of anaturally occurring phytochrome. Phytochromes are photosensory receptorsfound in plants, fungi, bacteria and cyanobacteria that absorb light inthe red and far-red part of spectrum and utilize linear tetrapyrrolebilins, such as biliverdin IXα (BV), phycocyanobilin orphytochromobilin, as chromophores. Bacterial phytochromes, also termedbacteriophytochrome photoreceptors (BphPs), use BV as a chromophore.Infrared fluorescent proteins derived by engineering BphPs (e.g.,Rhodopseudomonas palustris BphP such as RpBphP2) include IFP1.4 (Shu, X.et al. Science 324, 804-807 (2009), IFP2.0, (Yu, D., et al., NatureCommunications (2013); 5:3626|DOI: 10.1038/ncomms4626), iRFP (Filonov,G. S. et al. Nat. Biotechnol. (2011), 29, 757-761), IFPrev (BhattacharyaS, et al., J Biol Chem. 2014; 289(46):32144-52), iRFP670, iRFP682,iRFP702, iRFP713 and iRFP720 (Shcherbakova D M and Verkhusha V V; NatMethods. 2013; 10(8):751-4); Wi-Phy (Auldridge M E, et al., J Biol Chem.2012 Mar. 2; 287(10):7000-9). PAiRFP1 and PAiRFP2 are infraredfluorescent proteins derived from AtBphP2, from Agrobacteriumtumefaciens (Piatkevich K D, et al., Nat Commun. 2013; 4:2153. doi:10.1038/ncomms3153; Piatkevich, K., et al., Chem. Soc. Rev., 2013, 42,3441).

In some embodiments a photocontrollable fluorescent protein may be usedas a reporter molecule. Photocontrollable fluorescent proteins (PFPs)are FPs whose fluorescence is regulated by light irradiation of specificwavelengths. They include photoactivators and photoswitchers.Photoactivators convert from a non-fluorescent to a bright fluorescentstate and can be either irreversible or reversible. Photoswitcherschange their fluorescent state and emit at a different wavelength uponexposure to transient but intense light. Examples of PFPs include PAGFP,PSCFP2, KFP, Kaede, mEosFP, mEos3.1, mEos3.2, Dronpa, Dendra2, KikGR,and PamCherryl. These and other FPs are described in further detail inNowotschin S, et al. (2009) Trends Biotechnol 27(5): 266-276 and/orShcherbakova, D M, et al., Annu Rev Biophys. 2014; 43: 303-329. andreferences therein.

As used herein, “luciferase” refers to members of a class of enzymesthat catalyze reactions that result in production of light. Luciferaseshave been identified in and cloned from a variety of organisms includingfireflies, click beetles, sea pansy (Renilla), marine copepods, andbacteria among others. Examples of luciferases that may be used asreporter proteins include, e.g., Renilla (e.g., Renilla reniformis)luciferase, Gaussia (e.g., Gaussia princeps) luciferase), Metridialuciferase, firefly (e.g., Photinus pyralis luciferase), click beetle(e.g., Pyrearinus termitilluminans) luciferase, deep sea shrimp (e.g.,Oplophorus gracilirostris) luciferase). “Luciferin” is used herein torefer to any substrate utilized by a luciferase in a light-emittingreaction. Firefly luciferin and coelenterazine are examples.Coelenterazine is the substrate for many luciferases and photoproteinsincluding Renilla, Gaussia, and Metridia luciferases.

In some embodiments a variant of a naturally occurring luciferase thatprovides higher light output than the naturally occurring form and/or iscapable of utilizing an analog of a naturally occurring luciferin as asubstrate can be used. See, e.g., Loening, A M, et al., ProteinEngineering, Design and Selection (2006) 19 (9): 391-400, for examples.NanoLuc (NL) is an engineered variant (Hall, N P, et al., ACS Chem Biol.2012; 7(11):1848-57). Furimazine is an analog of coelenterazineoptimized as a substrate for NL. The luciferase system encoded by thebacterial luciferase gene cassette (lux) has the ability to synthesizeand/or scavenge all of the substrate compounds required for productionof light and can therefore be used as a reporter molecule without theneed to provide a luciferin. It has been codon optimized for expressionin mammalian cells and successfully used to image cells in cell cultureand in small animal imaging (Close D., et al., J. Biomed. Opt. 2011;16:e12441; Close, D., et al., Sensors (Basel). 2012; 12(1):732-52).

In some embodiments, the reporter molecule in an RGM construct or otherreporter construct described herein is detectable based on its effect onexpression of a second reporter gene or third reporter gene in the cell.The second reporter molecule may be a directly detectable reportermolecule such as a FP or luciferase. Expression of the reporter moleculein the RGM construct will generally turn expression of the secondreporter gene on or off, thereby allowing detection of the activity ofthe promoter in the RGM construct and thus detection of the methylationstate of the region of interest in proximity to the RGM construct in thegenome. For example, in some embodiments, the reporter molecule in anRGM construct comprises a site-specific recombinase, such as Cre, Flp,or other site-specific recombinase. In some embodiments, the reportermolecule in an RGM construct comprises a repressor protein, such as theTet repressor. The use of a site-specific recombinase as a reportermolecule allows for the creation of irreversible, heritable, genomemodifications upon activity of the reporter molecule. Such genomemodifications can result in permanent expression of a second reportermolecule by a cell and its descendants, thereby allowing for lineagetracing.

In some aspects, use of a site-specific recombinase as a reportermolecule can convert a transient or permanent change in methylationstate of a region of interest to a permanent and heritable change in thecell. However, if the RGM reporter molecule is a FP, luciferase, orother molecule that degrades or is diluted over time without creating apermanent and heritable change in the cell then the RGM construct canreport on methylation changes in a reversible manner, e.g., it canreport on an increase in methylation of the ROI at a first time pointand then report on a decrease in methylation of the ROI at a second timepoint, or vice versa. In other words, in such embodiments, production ofthe reporter molecule encoded by the RGM construct can track themethylation state of the ROI, so that if the methylation state of theROI changes over time, the level of the reporter molecule likewisechanges.

In some embodiments in which the reporter molecule in an RGM constructcomprises a site-specific recombinase, the genome of the cell into whichthe RGM construct is introduced typically comprises or is modified tocomprise a sequence encoding a second reporter molecule that is notproduced in the absence of activity of the site-specific recombinase butis produced upon a recombination event mediated by the site-specificrecombinase. Thus, when the RGM promoter is active, the recombinase isproduced, leading to production of the second reporter molecule. Therecombination event may be removal of a sequence or may be inversion ofa sequence. For purposes of description it will be assumed that thesite-specific recombinase is Cre, but it should be understood that othersite-specific recombinases may be used in a similar manner. In someembodiments the genome of the cell comprises a second (or third, fourth,etc.) reporter construct comprising a promoter (e.g., a constitutivepromoter), a second reporter gene, and a loxP-STOP-loxP sequence (a STOPcassette). In some embodiments a nucleic acid comprising a STOP cassetteand a second reporter gene is integrated into the genome of the celldownstream of an endogenous promoter. In some embodiments the nucleicacid comprising a STOP cassette and second reporter gene is integratedinto the mouse Rosa26 gene locus or the human AAVS1 locus or anothersafe harbor locus. The STOP cassette is appropriately positioned in thesecond reporter construct or in the genome of the cell so that thereporter molecule encoded by the second reporter gene is not producedunless the STOP sequence is removed (i.e., the reporter gene is “off”).For example, the STOP cassette may be positioned between the promoterand the second reporter gene. Those of ordinary skill in the art areaware of suitable STOP sequences. For example, in some embodiments theSTOP sequence may comprise at least a polyadenylation signal and/or stopcodon to block gene transcription and/or translation in the absence ofCre. Cre-mediated excision of the STOP cassette is irreversible, therebyallowing for permanent expression of the previously transcriptionallysilent reporter gene. Once activated, expression of the second reportermolecule is independent of subsequent Cre expression or activity. Thestable inheritance of the active second reporter gene by the progeny ofthe original cell in which the RGM promoter drove transcription of Creallows for detection of all progeny of the original cell. In someembodiments, this allows for lineage tracing. In some embodiments, thegenome of the cell comprises a sequence comprising a promoter and asequence encoding a second reporter molecule that is not operably linkedto the promoter but would become so operably linked uponrecombinase-mediated inversion of the sequence or uponrecombinase-mediated inversion of a sequence comprising the promoter. Inother words, recombinase-mediated inversion would bring the sequenceencoding the second reporter molecule into operable association with thepromoter, resulting in expression of the second reporter molecule. Asdescribed above, once activated, expression of the second reportermolecule is independent of subsequent Cre expression or activity. Thestable inheritance of the active second reporter gene by the progeny ofthe original cell in which the RGM promoter drove transcription of Creallows for detection of all progeny of the original cell.

In some embodiments, the RGM construct comprises a RGM promoter operablylinked to a sequence that encodes a transcriptional repressor proteinthat is capable of binding to DNA in a sequence-specific manner andrepressing transcription from a nearby promoter. The transcriptionalrepressor protein can act as a reporter molecule in the context of acell that comprises binding sites for the repressor as follows: In suchembodiments, the cell into which the RGM construct is introducedtypically comprises or is modified to comprise a second reporterconstruct, which comprises a sequence encoding a second reportermolecule, an operably linked promoter, and a binding site for therepressor protein, wherein the binding site is positioned such that thesecond reporter molecule is not produced in the presence of therepressor protein because the second reporter molecule binds to thebinding sites and inhibits transcription. Those of ordinary skill in theart are aware of suitable repressor proteins and the sequences in DNA towhich they bind. For example, the Tet repressor (TetR), Lac repressor(LacR), or other bacterial, archael, fungal, plant, or othernon-mammalian transcriptional repressor protein comprising asequence-specific DNA binding domain (DBD) may be used. Furthermore, aDNA binding domain of a transcriptional repressor or activator (in theabsence of a transcriptional activation domain) could serve as atranscriptional repressor.

The binding site(s) for the DBD may be positioned upstream from thepromoter. In such embodiments, when the promoter in the RGM construct isactive, the repressor protein is produced, and the second reportermolecule is not produced. When the promoter in the RGM construct isinactive, the repressor protein is not produced, and the second reportermolecule is produced. In some embodiments the second reporter moleculecomprises a site-specific recombinase and the genome of the cell furthercomprises the genome of the cell into which the RGM construct isintroduced typically comprises or is modified to comprise a sequenceencoding third reporter molecule that is not produced in the absence ofactivity of the site-specific recombinase but is produced upon arecombination event mediated by the site-specific recombinase. Forexample, the genome of the cell may comprise a third reporter constructcomprising a promoter, a reporter gene, and a loxP-STOP-loxP sequence (aSTOP cassette), arranged such that the reporter gene is not transcribedunless the STOP cassette is removed via a recombinase-mediatedrecombination event. Such a system allows for generating a permanent,heritable mark when the RGM promoter is inactive. Under conditions inwhich the RGM promoter is inactive, the repressor protein is notproduced. Consequently, the recombinase is produced and mediatesrecombination to excise the STOP cassette, allowing expression of thethird reporter molecule. The various systems described herein make itpossible to permanently mark cells based on either activity or lack ofactivity of the RGM promoter. Thus, in some embodiments cells arepermanently marked when the ROI is methylated, e.g., hypermethylated. Insome embodiments cells are permanently marked when the ROI isdemethylated, e.g., hypomethylated. For example, if the RGM promoter isone that responds to methylation of the ROI by becoming less active(e.g., the Snrpn promoter) and it is desired to mark cells in which theROI becomes demethylated (e.g., is hypomethylated), then one could use asite-specific recombinase as the reporter molecule encoded by the RGMconstruct. Demethylation of the ROI would result in increased activityof the RGM promoter, which drives synthesis of a transcript encoding therecombinase. The recombinase mediates recombination to activateexpression of a second reporter molecule (e.g., a FP or luciferase) froma second reporter construct integrated elsewhere in the genome (or on astable episome). The active second reporter construct remains active inthe cell and is inherited by the cell's progeny, thus marking thempermanently. If the RGM promoter is one that responds to methylation ofthe ROI by becoming less active (e.g., the Snrpn promoter) and it isdesired to mark cells in which the ROI becomes methylated (e.g., ishypermethylated), then one could use an RGM construct in which the RGMpromoter drives synthesis of a transcript encoding a transcriptionalrepressor such as TetR. In the situation in which the ROI is initiallyhypomethylated, the RGM promoter would drive transcription of atranscript encoding the TetR. TetR would bind to a TetO site of a secondreporter construct, thereby blocking activity of the promoter in saidsecond reporter construct. Methylation of the ROI would result indecreased activity of the RGM promoter, which would result in decreasedsynthesis of the TetR. The promoter in the second reporter constructwould then drive synthesis of a transcript encoding a site-specificrecombinase, which would then activate (via a site-specificrecombination event) synthesis of a third reporter molecule such as anFP or luciferase, which could then be detected.

A large number of DBDs and the sequences to which they bind are known inthe art and can be used in various embodiments. Types of DBDs include,for example, helix-turn-helix, helix-loop-helix, zinc finger, leucinezipper, winged helix, winged helix turn helix, HMG-box, immunoglobulinfold, B3 domain, and TAL effector DBD. Naturally occurring DBDs arefound in prokaryotic and eukaryotic organisms, e.g., bacteria, fungi(e.g., yeast), plants, invertebrates (e.g., insects), and vertebrates.In some embodiments, a full length naturally occurring DBD-containingprotein is used. In other embodiments, a DBD-containing fragment orvariant is used. For example, a transcriptional activation or repressiondomain may be deleted. Exemplary prokaryotic transcriptional regulatorfamilies include, e.g., the LysR, AraC/XylS, TetR, LuxR, Lac/GalR, ArsR,IcIR, MerR, AsnC, MarR, NtrC, OmpR, DeoR, cold shock, GntR, and Crpfamilies. See, e.g., Swint-Kruse, L and Matthews, K (2009). CurrentOpinion in Microbiology, 12(2): 129-137 Wilson, C J, et al. (2007)Cellular and Molecular Life Sciences, 64(1), 3-16, Culard, F., et al.,(1987) Eur. Biophys. J. 14: 169-178, and Ramos, J L, et al., (2005),Microbiol. & Mol. Biol. Rev., 69(2): 326-356, and references in theforegoing

A sequence-specific DBD binds preferentially to its sequence as comparedwith its binding to other DNA sequences. For example, the affinity ofsuch a DBD for a DNA segment containing a binding site for the DBD canbe, e.g., at least 10-fold, 100-fold, 1000-fold or more greater than itsaffinity for random DNA sequences. In some embodiments, the Kd forbinding of a DBD to its binding site is less than about 10⁻⁶ M, lessthan about 10⁻⁷ M, less than about 10⁻⁸ M, less than about 10⁻⁹, lessthan about 10⁻¹⁰ M, less than about 10⁻¹¹ M, or less than about 10⁻¹² M.One of skill in the art will readily be able to obtain sequences ofnumerous DBDs and the sequences to which they bind. The binding site towhich the DBD binds in a sequence-specific manner may be, e.g., fromabout 10-15 nt to about 40-50 nt long. Multiple copies of the bindingsite, e.g., between 2 and 10 copies, or more, may be used. In someembodiments the bacterial Tet repressor (TetR) or Lac repressor (LacR)may be used. LacR binds to the bacterial LacO sequence. TetR binds tothe 19 bp bacterial TetO sequence (5′-TCCCTATCAGTGATAGAGA-3) (SEQ ID NO:7). In some embodiments two or more TetO sequences may be used. In someembodiments a tetracycline response element (TRE), which consists of 7repeats of the TetO sequence separated by spacer sequences, may be used.Those of ordinary skill in the art are aware of suitable TetO sequencesand variants thereof (see, e.g., Löw, R., et al. (2010) BMC Biotechnol.10:81).

Additional variations are within the scope of the disclosure. Forexample, in some embodiments an artificial transcriptional regulator maybe used as a reporter molecule in an RGM construct. The term “artificialtranscriptional regulator” refers to (a) a non-naturally occurringprotein that comprises a sequence-specific DNA binding domain (DBD) orexhibits sequence-specific RNA-guided DNA binding and a transcriptionalactivation or repression domain or (b) a protein that (i) comprises asequence-specific DNA binding domain or exhibits sequence-specificRNA-guided DNA binding and (ii) lacks a transcriptional activation orrepression domain. The second type of artificial TF can reducetranscription by blocking RNA polymerase progression along the DNAtemplate. In some embodiments the sequence-specific DNA binding domainis capable of specifically binding to a DNA sequence that does not occurnaturally in the human or mouse genome. In some embodiments anartificial transcriptional regulator comprises the DBD of a TALE or ZFNbut lacks the cleavage domain. In some embodiments an artificialtranscriptional regulator comprises a modified Cas protein havingmutations that render it catalytically inactive. An effector domaincomprising a transcriptional activation domain (e.g., a multimer of theVP16 activation domain) or a transcriptional repression domain is fusedto the DBD or catalytically inactive Cas protein. In some embodimentsthe DBD or catalytically inactive Cas (in the presence of an appropriateguide RNA) binds to binding sites in the vicinity of a promoter in DNAin a sequence-specific manner and activates or inhibits transcriptionfrom the promoter.

Other reporter molecules that may be used in certain embodiments includeenzymes such as beta-galactosidase, alkaline phosphatase, or others thatproduce a colorimetric readout, by, e.g., catalyzing the conversion ofchromogenic substrates into colored products. In one embodiment thereporter is not chloramphenicol acetyltransferase (CAT).

In certain embodiments any sequence of interest can be operably linkedto an RGM promoter (e.g., a Snrpn promoter) and introduced into a cell,e.g., integrated into the genome of the cell in proximity to an ROI, inorder to render expression of the sequence regulatable based on themethylation state of the ROI. For example, the sequence could encode aprotein or a functional RNA. In some embodiments the sequence may encodea shRNA that may inhibit expression of another gene in the cell. Certainembodiments of the disclosure are directed to such uses of an RGMpromoter and to nucleic acids comprising an RGM promoter operably linkedto any sequence of interest. The nucleic acid may further comprise anyof the other components that are described herein in the context of anRGM construct. The sequence of interest may encode a protein or RNA thatmodulates cell type, cell state, or cell phenotype and/or may encode atherapeutic protein or RNA in certain embodiments. Examples of genes ofinterest are mentioned above. In some embodiments the sequence mayencode a gene product of any of such genes.

In some embodiments, a reporter molecule with a half-life of betweenabout 45-60 minutes, about 60-75 minutes, or 75-90 minutes may be used.Use of a reporter molecule with fast turnover kinetics makes the timewindow during which the reporter molecule is detected more closely matchthe activity of the promoter that directs its production, which mayfacilitate the ability to detect reversible changes in methylation stateand/or may make it possible to detect changes in methylation state morerapidly than would otherwise be the case. In some embodiments, areporter gene encodes an mRNA that comprises a sequence thatdestabilizes the mRNA (an “mRNA-destabilizing sequence”). In someembodiments, the mRNA destabilizing sequence is anadenylate-uridylate-rich element (AU-rich elements; ARE). AREs arecis-acting elements found in the 3′ untranslated region (UTR) of anestimated 5-8% of human mRNAs, including numerous cytokines,oncoproteins, and growth factors, and their presence generallyaccelerates mRNA turnover. ARE sequences are well known in the art (see,e.g., Wu, X & Brewer, G. Gene. 2012; 500(1): 10-21, and referencestherein). An exemplary ARE comprises 1-4 copies of the sequenceUUAUUUAUU. In some embodiments a reporter gene encodes a reporterprotein that comprises a sequence that destabilizes the protein(“protein destabilizing sequence”) such as a PEST sequence. A proteindestabilizing sequence may destabilize a protein that contains it bytargeting the protein for degradation ubiquitin-mediated orubiquitin-independent pathways.

In some embodiments a DNA methylation reporter comprises a region thatencodes a polypeptide that comprises a reporter protein and one or moreadditional proteins (e.g., one, two, three, or more additionalproteins), wherein adjacent proteins are separated by regions comprisinga self-cleaving 2A peptide. Self-cleaving 2A peptides (often referred tosimply as “2A peptides”) mediate “ribosomal skipping” between prolineand glycine residues in the peptide and inhibit peptide bond formationbetween these residues without affecting downstream translation. 2Apeptides allow multiple proteins to be encoded by a polycistronic mRNAas a polyprotein, which dissociates into component proteins upontranslation. Use of the term “self-cleaving” is not intended to imply aproteolytic cleavage reaction. Self-cleaving peptides are typicallyabout 18-22 amino acids long and are found in members of thePicornaviridae virus family, including aphthoviruses such asfoot-and-mouth disease virus (FMDV), equine rhinitis A virus (ERAV),Thosea asigna virus (TaV) and porcine teschovirus-1 (PTV-1) (Donnelly, ML, et al., J. Gen. Virol. 2001; 82, 1027-101; Ryan, M D, et al., 2001;J. Gen. Virol., 72, 2727-2732) and cardioviruses such as Theilovirus(e.g., Theiler's murine encephalomyelitis) and encephalomyocarditisviruses. Positioning a region that encodes a 2A peptide between twoprotein coding sequences allows the synthesis of two separate proteinsby translation of a single mRNA, without requiring use of an IRES.Further description of 2A peptides and examples of their use tocoexpress multiple proteins from a polycistronic mRNA are found in U.S.Patent App. Pub. No. 20120028821. The one or more additional proteinsmay comprise additional reporter proteins. In some embodiments openreading frames may be separated by IRES sequences to allow forproduction of a polycistronic transcript encoding multiple proteins froma single promoter.

In some embodiments a DNA methylation reporter comprises a region thatencodes a polypeptide comprising a reporter protein linked to one ormore additional proteins. The reporter protein and the one or moreadditional proteins are encoded by protein coding regions that arejoined so as to form a single open reading frame. The polypeptide, whichmay be referred to as a “fusion protein” or “chimeric protein” typicallyhas functional properties conferred by each of its component proteins.

A protein that contains two or more regions or domains (e.g., two ormore regions or domains that originate from different proteins, such asa fusion protein), may comprise a linker between any two or more of thedomains or regions. A linker may serve to allow the regions or domainsto fold independently and/or move flexibly in relation to each other.The linker region is typically a short polypeptide chain (e.g., 1-50amino acids, e.g., 5-25 or 5-15 amino acids). The precise length andsequence are typically not critical. Small amino acid residues such asserine, glycine, and alanine are of use. Examples include (Gly)_(n),(Gly-Ser)_(n), ((Gly)₄Ser)_(n), (Gly-Ala)n, wherein n is an integer andthe total number of amino acids in the linker is typically between 1 andabout 30, and variants in which any of the amino acid residues isrepeated with the proviso that the total number of amino acids is withinone of the aforementioned ranges.

In any of the embodiments described herein that involve a DNAmethylation reporter that encodes a reporter protein and one or moreadditional proteins, any one or more of the additional proteins may be areporter protein. For example, a DNA methylation reporter may encode twoor more reporter proteins. In some embodiments one or more of theadditional proteins can be any protein, the expression of which it is ofinterest to control in a manner that depends on methylation of theregion of interest. The protein may be a transcription factor,transcriptional co-activator, enzyme, transporter, ion channel, enzyme,etc.

In any of the embodiments described herein that involve two or morereporter molecules (e.g., two or more reporter proteins), the reportermolecules may be the same or different. For example, in embodiments inwhich a DNA methylation reporter encodes two or more reporter molecules,or in which a cell comprises two or more nucleic acids each encoding areporter molecule, the reporter molecules may be the same or different.In some embodiments at least two of the reporter molecules aredistinguishable from each other. Reporter molecules are distinguishableif they have distinguishable readouts or are detected using differenttechniques such that one can determine which molecule is being detected.In some embodiments two distinguishable reporter molecules may be: (a)first and second fluorescent proteins with distinct emission maxima; (b)a fluorescent protein and a luciferase; (c) a fluorescent protein and asite-specific recombinase, etc.

In some embodiments two or more distinct DNA methylation reporters thatencode different reporter molecules are integrated into the genome ofthe cell. The two or more distinct reporters are typically integrated atdifferent locations and can be used to detect the methylation state oftwo or more different regions of genomic DNA. In some embodiments thereporters encode reporter molecules that produce distinguishablereadouts so that they can be independently detected. The two or moredifferent reporter molecules may be of the same category (e.g.,different fluorescent proteins) or different categories (e.g., afluorescent protein and a luciferase). In some embodiments two or morefluorescent proteins with distinct emission spectra may be used. Forexample, a first fluorescent protein that emits green light and a secondfluorescent protein that emits red light may be used. The locations atwhich the two or more reporters are integrated can be anywhere in thegenome. They may be in the same chromosome or in different chromosomes.In some embodiments the reporters are integrated in proximity to thesame region of interest in each of two homologous chromosomes (i.e., thetwo alleles of the region of interest present in diploid cells).

In some embodiments a reporter gene encodes a polypeptide that comprisesa fragment of a reporter protein, wherein the fragment does not have thereporter activity characteristic of the full length reporter protein butis capable of physically associating with a second fragment of thereporter protein to form a functional reporter protein. The twofragments are said to “complement” each other and may be referred to as“complementation fragments” or members of a “complementation pair”. Anyreporter protein that can be split into two parts and reconstitutednon-covalently may be used in various embodiments. In some embodimentsthe reporter protein is an enzyme or a chromophore. The reporter proteinis detectable if the two promoters that drive expression of the membersof the complementation pair are active during overlapping time periodsor at least sufficiently close together in time such the proteinsencoded by transcripts whose synthesis is directed by the promoters arepresent in the cell at the same time so that they can associate to forman active reporter molecule. Such activity may be referred to as“coincident activity”.

In some embodiments, a split reporter molecule may be used to indicatewhether any two promoters of interest exhibit coincident activity in agiven cell. In some embodiments, two DNA methylation reporters, eachcomprising a RGM promoter operably linked to a sequence encoding acomplementation fragment of a split reporter molecule, are integrated inproximity to regions of interest in the genome of a cell. Detection ofthe reporter molecule indicates that both promoters are active. Ifeither or both promoters is inactive (as a result of methylation of theregion of interest), the complementation fragment encoded by theoperably linked sequence is not produced, and the reporter molecule isnot detected. In some embodiments a split reporter molecule may be usedto indicate both the level of methylation of a ROI in a cell and thatthe cell is a member of a particular cell population thatcharacteristically expresses a certain cell type specific marker or cellstate specific marker. In such embodiments, a DNA methylation reportercomprising a sequence encoding a complementation fragment of a splitreporter molecule is integrated in proximity to a region of interest inthe genome of a cell. A DNA sequence encoding the other complementationfragment of the split reporter molecule is placed under control of thepromoter that directs expression of the marker. The DNA sequence may beinserted into the genome under control of the endogenous promoter (i.e.,the promoter that is naturally present in the cell in its naturallocation) or the promoter and DNA sequence may be in a construct thathas been introduced into the cell or an ancestor of the cell and, insome embodiments, integrated into the genome. Those of ordinary skill inthe art are aware of split reporter proteins and of fragments that canserve as complementation fragments. In some embodiments, a splitrecombinase, e.g., split Cre, is used as a split reporter protein. Forexample, amino acids residues 19-59 and 60-343 of Cre can be used ascomplementation fragments. In some embodiments, a split luciferase isused as a split reporter protein. In some embodiments a splitfluorescent protein is used.

In some embodiments, reporter molecules that generate a detectablesignal based on the occurrence of fluorescence resonance energy transfer(FRET) or bioluminescence resonance energy transfer (BRET) may be used.FRET is a distance-dependent interaction between the electronic excitedstates of two molecules in which excitation is transferred from a donormoiety to an acceptor moiety without emission of a photon, resulting inphoton emission from the FRET acceptor. In order for FRET to occur thedonor and acceptor should be in very close proximity, e.g., less thanapproximately 10 nm, and the absorption spectrum of the acceptor mustoverlap the fluorescence emission spectrum of the donor. BRET isanalogous to FRET but uses a bioluminescent reporter molecule such as aluciferase as an energy donor and a fluorescent moiety, e.g., abiomolecule such as GFP as the acceptor, thus eliminating the need foran excitation light source (see Pfleger, K. an Eidne, K., NatureMethods, 3(3), 165-174, 2006, for a review). In a typical BRET assay,oxidation by the donor of a suitable substrate results in transfer ofenergy to the acceptor, resulting in photon emission by the acceptor. Apair of reporter molecules capable of generating a detectable signalbased on FRET or BRET may be referred to as a FRET or BRET pair,respectively. FRET or BRET pairs may be used to indicate whether any twopromoters of interest exhibit coincident activity in a given cell. Insome embodiments, a first DNA methylation reporter comprising amethylation-sensitive promoter operably linked to a first member of aFRET or BRET pair is integrated in proximity to a first region ofinterest in the genome of a cell. A second DNA methylation reportercomprising a methylation-sensitive promoter operably linked to thesecond member of the FRET or BRET pair is integrated in proximity to asecond region of interest in the genome of the cell. Detection of theFRET signal (fluorescence) or BRET signal (bioluminescence),respectively, indicates that both promoters are active. If either orboth regions of interest is hypermethylated, the promoter of theassociated DNA methylation reporter construct is inactive, and the FRETor BRET signal is not detected. In some embodiments a FRET or BRET pairmay be used to indicate both that a region of interest is hypomethylatedin a cell and that the cell is a member of a particular cell populationthat characteristically expresses a certain marker. In such embodiments,a DNA methylation reporter comprising a sequence encoding a first memberof a FRET or BRET pair is integrated in proximity to a region ofinterest in the genome of a cell. A DNA sequence encoding the othermember of the FRET or BRET pair is placed under control of the promoterthat directs expression of the marker. The DNA sequence may be insertedinto the genome under control of the endogenous promoter or the promoterand DNA sequence may be in a construct that has been introduced into thecell or an ancestor of the cell and, in some embodiments, integratedinto the genome. In some embodiments a FRET or BRET pair may be used toindicate whether a cell that comprises a DNA methylation reporter is amember of a particular cell population that characteristically expressestwo different markers. In such embodiments a DNA sequence encoding afirst member of the FRET or BRET pair is placed under control of thepromoter that directs expression of the first marker, and a second DNAsequence encoding the other member of the FRET or BRET pair is placedunder control of the promoter that directs expression of the secondmarker. In some embodiments either or both DNA sequences may be insertedinto the genome under control of their respective endogenous promoters.In some embodiments one or more construct(s) that comprise the promoteroperably linked to the DNA sequence may be introduced into the cell oran ancestor of the cell and, in some embodiments, integrated into thegenome. Those of ordinary skill in the art are aware of reportermolecules that can be used as FRET or BRET pairs. For example, CFP/YFPvariants and GFP/RFP variants can be used for FRET. RLuc/YFP variantscan be used for BRET, to name a few.

In some embodiments the complementation fragments or FRET/BRET pairmembers are each fused to proteins or protein domains that have highaffinity for each other and are prone to bind to each other when presentin a cell. Such proteins or protein domains may be referred to as“interaction domains”. Binding of the interaction domains with eachother brings the complementation fragments close together, therebyincreasing the likelihood that they will associate and reconstitute anactive reporter protein. Numerous proteins are known to contain proteininteraction domains. Such proteins, or the interaction domains thatmediate their dimerization (dimerization domains) may be used. Forexample, the dimerization domains of transcription factors or receptorsthat function as dimers may be used. In some embodiments thedimerization domain is a coiled coil domain. In some embodiments thecoiled coil domain comprises a leucine zipper. For example, apolypeptide comprising at least the leucine zipper of a transcriptionfactor may be used. In some embodiments the transcription factor is theyeast transcription factor Gcn4.

Other types of reporter systems useful for detecting coincident activityof two or more promoters are also within the scope of the presentdisclosure. For example, a reporter system that comprises multiple geneproducts (such as those encoded by the bacterial lux operon) may beused. DNA sequences that encode different gene products can be placedunder control of different promoters in order to report on coincidentactivity of the different promoters. It should be understood thatreporter systems that can be used to report on coincident activity oftwo promoters can alternately or additionally be used to report on theactivity of a single promoter by using a constitutive promoter as one ofthe two promoters so that the complementation fragment-encoding RNAwhose synthesis is under control of the promoter is generally producedregardless of the cell type or other conditions. In such instances, thepresence of an active, reconstituted reporter depends on the activity ofthe promoter that drives transcription of the other complementationfragment. It should also be understood that one or more reportermolecules useful for detecting coincident activity of two promoters maybe used in combination with one or more other reporter molecules in thesame cell, cell population, or organism. For example, a split reportermolecule may be used to detect coincident activity of two promoters thatdirect transcription of RNA encoding different markers, and a differentreporter molecule may be used to detect the methylation state of asingle region of interest in the same cell.

In some embodiments a protein, e.g., a reporter protein or targetablenuclease or site-specific recombinase, comprises a cellular targetingsignal. The term “cellular targeting signal” refers to a peptide thatwhen present in a protein expressed by a cell, directs the protein to aparticular region in a cell (e.g., a particular type of organelle orcell structure) or directs the protein for secretion. In someembodiments the cellular targeting signal is a nuclear localizationsignal (NLS), which is a cellular targeting signal that directs proteinsto the nucleus. A NLS often comprises one or more sequences of fivebasic, positively-charged amino acids. In some embodiments the cellulartargeting signal is a signal peptide (also termed a secretion signalsequence), which is a cellular targeting signal that directs a proteinthat contains it to the secretory pathway. The protein may be secretedor may be retained at the plasma membrane as a membrane-bound (e.g.,transmembrane) protein.

In some embodiments, a reporter protein is targeted to the plasmamembrane as a membrane-bound protein comprising an extracellular domain.In some embodiments the extracellular domain can interact with anextracellular substance, such as an enzyme substrate, a detectable label(e.g., a small molecule fluorophore), or an affinity reagent or evenanother cell. Cellular targeting signals are found in numerous naturallyoccurring proteins, and such sequences (or variants or consensussequences derived therefrom) may be appended to or inserted into otherproteins in order to direct those proteins to a desired location. Thoseof ordinary skill in the art are familiar with cellular targetingsequences and their use and will be able to select and use a suitablecellular targeting sequence for purposes of targeting a protein to adesired subcellular location or for secretion or retention as atransmembrane protein. For example, in some embodiments an SV40 NLS maybe used to target a protein to the nucleus. In some embodiments acellular targeting signal that directs the protein to be retained at theplasma membrane comprises the transmembrane domain of a transmembraneprotein, such as CD4.

Some naturally occurring reporter proteins may contain signal sequencescapable of directing secretion of the protein in mammalian cells. Forexample, Gaussia luciferase contains such a sequence. In someembodiments such a sequence may be at least in part removed or modifiedto reduce or abolish its ability to direct secretion in mammalian cells.

One of ordinary skill in the art will readily be able to obtain nucleicacid sequences encoding reporter molecules described herein. It will beunderstood that due to the degeneracy of the genetic code, a proteinsequence may be encoded by any of a wide variety of different nucleicacid sequences A nucleic acid sequence that encodes a reporter moleculeor other polypeptide to be expressed in a cell which is of a differentspecies to that in which the nucleic acid is naturally found (i.e., towhich it is native) may be modified in any of a variety of ways relativeto the naturally occurring sequence. Such modification may be performed,e.g., in order to increase the level of expression of the polypeptide,cause the polypeptide to be localized to a particular region ororganelle of the cell, cause the polypeptide not to be localized to aparticular region or organelle of the cell, cause the polypeptide to besecreted or not to be secreted, etc. Due to redundancy in the geneticcode, which allows amino acids to be encoded by multiple differentcodons, a given polypeptide can be encoded by numerous different nucleicacid sequences. However, different organisms may use some codonsencoding a particular amino acid more effectively than other codons thatencode the same amino acid. The efficiency of protein translation in anon-native cell can be increased by altering the codon usage to moreclosely reflect preferred codon usage of the non-native cell while stillencoding the same gene product, i.e., the coding sequence may be codonoptimized. In some embodiments a nucleic acid sequence that has beencodon optimized for expression in mammalian cells, e.g., mouse cells orhuman cells, may be used as a reporter gene in a reporter construct ofthe present disclosure or for expressing any protein in the context ofthe present disclosure.

Nucleic acids (e.g., vectors) comprising sequences that encode reportermolecules (such as luciferase, fluorescent proteins, targetablenucleases) are available from a variety of sources such as Addgene,Clontech, Promega, and others. For example, numerous plasmids containingsequences coding for the reporter molecules (e.g., various FPs andluciferases) described herein or others known in the art are available.In some embodiments a promoter in such a plasmid, which would ordinarilydrive expression of the reporter molecule, is replaced by an RGMpromoter, e.g., a Snrpn promoter. In some embodiments a sequenceencoding a reporter molecule, a sequence comprising an RGM promoter, andother sequences (if desired) such as donor nucleic acid, etc., may beinserted into a cloning vector such as a TOPO cloning vector (e.g., pCR2series), Gateway cloning vector, or the like. Nucleic acids and vectorsdescribed herein can be produced using any of the various methods knownin the art for producing nucleic acid constructs. For example, they maybe chemically synthesized, produced in suitable host cells, producedusing PCR, etc. In some embodiments a mammalian imprinted gene promoteror portion thereof, a DMR or portion thereof, a DNA region of interest,homologous sequences useful as donor DNA (e.g., homology arms) may beamplified from genomic DNA, e.g., using PCR, and inserted into a vectorupstream of a reporter gene. It will be appreciated that nucleic acidsdescribed herein can be assembled from individual components usingrestriction enzymes, ligation, PCR, or other standard methods known inthe art.

In general, a reporter molecule may be detected using any suitabledetection method and/or apparatus known in the art. One of ordinaryskill in the art will be able to select a suitable method and apparatusdepending on factors such as the properties of the particular reportermolecule, the conditions and goals of the assay, etc. A fluorescentmolecule may be detected using a fluorimeter, flow cytometry,fluorescence microscopy. Fluorescence-activated cell sorting (FACS) maybe used to analyze and/or sort cells based on fluorescence. In theluciferase reaction, light is emitted when luciferase acts on theappropriate luciferin. Photon emission can be detected by lightsensitive apparatus such as a luminometer or various opticalmicroscopes. Microplate readers, scanning spectroscopy, and microscopescoupled to charge-coupled device (CCD) cameras may be used. In someembodiments stimulated emission depletion (STED) microscopy may be used.Suitable instrumentation systems are available to automate detection ofsignals from intact cells, including automated fluorescence imaging andautomated microscopy systems.

In some embodiments the reporter molecule may be detected in abiological sample obtained from a subject, e.g., a living subject, e.g.,a living rodent. In some embodiments the biological sample comprisesintact, living cells. In some embodiments the biological samplecomprises an organ or tissue slice, e.g., a brain tissue slice (e.g., ahippocampal slice) or other organ or tissue slice.

In some embodiments an RGM construct is used to detect methylation in acell-based model of an isolated organ or tissue. For example, in someembodiments cells comprising an RGM construct integrated into theirgenome are cultured in or on a three-dimensional scaffold. In someembodiments the scaffold comprises a hydrogel. In some embodiments thescaffold comprises a polymer. In some embodiments a polymer is asynthetic polymer, e.g., PEG. In some embodiments a polymer is anaturally occurring or synthetic polypeptide or polysaccharide. In someembodiments cells of interest comprise hepatocytes, myocytes (e.g.,cardiomyocytes), or neurons. In some embodiments cells comprisefibroblasts. For example, hepatocytes and fibroblasts may beco-cultured. In some embodiments a scaffold comprises substances thatmay provide a supportive microenvironment for cells associatedtherewith. Such substances may include, e.g., growth factors,extracellular matrix (ECM) components such as ECM proteins or portionsthereof (e.g., RGD-containing peptides). In some embodiments Matrigel®is used. In some embodiments an engineered in vitro model of parenchymaltissue (e.g., human liver). See, e.g., PCT/US2006/020019 (WO2006127768)or Khetani S R, Bhatia S N. Nat Biotechnol. 2008; 26:120-126, forexamples.

In some embodiments cells are in an isolated organoid, embryoid body,spheroid, or other three-dimensional structure. Organoid refers to athree-dimensional cellular structure that resembles an organ or tissueof the body. In general, organoids comprise multiple differentiated celltypes that are found in the relevant organ or tissue in vivo andreproduce the spatial morphology and cell-cell interactions as found inthat organ or tissue. In some embodiments an organoid is an epithelialorganoid. In some embodiment an organoid is a brain organoid or liverorganoid. Methods for preparing organoids are known to those of ordinaryskill in the art. In some embodiments an RGM construct is used to detectmethylation in cells in cultured skin. In some embodiments themethylation state of a region of interest, e.g., a superenhancer,enhancer, or promoter of a cell type specific gene or cell statespecific gene, is detected or monitored as a tissue or organ develops invivo or in an organoid, embryoid body, etc.

In some embodiments the reporter molecule may be detected in a livingsubject, e.g., a living mouse or other rodent. A variety of imagingmethods can be used for in vitro and/or in vivo imaging, such as in vivoluminescence imaging, fluorescence imaging, magnetic resonance imaging,two-photon laser scanning microscopy (TPLSM) (Zinselmeyer, B. H. et al.Methods Enzymol 461, 349-378), photoacoustic imaging (Krumholz, A., etal. Sci Rep. 2014; 4:3939), single photon emission computed tomography(SPECT), positron emission tomography (PET). Those of ordinary skill inthe art are aware of suitable systems and methods for performing in vivoimaging for detection of reporter molecules in a living subject. Forexample, in some embodiments the IVIS Imaging System (Xenogen, Carlsbad,Calif.) may be used. It will be understood that if luciferase expressionis to be measured, an appropriate luciferin substrate is administered tothe subject. If a photoactivatable reporter molecule is used, cells willbe exposed to light of the appropriate wavelength.

Using the teachings of the present disclosure a suitable reportermolecule with an appropriate sensitivity and/or dynamic range for agiven application (e.g., use in vitro or in vivo) can be selected. Insome embodiments a baseline level of the reporter molecule thatcorresponds to a given level or range of levels of methylation may bedetermined and used as a reference level.

Targetable Nucleases and Uses Thereof

In some embodiments an RGM construct is integrated into the genome inproximity to a region of interest in the genome using a targetablenuclease. Targetable nucleases generate DNA breaks in the genome at aselected target site and can be used to produce precise genomicmodifications. DNA breaks, e.g., double-stranded DNA breaks, can berepaired by various DNA repair pathways. Non-homologous end joining(NHEJ) ligates the broken ends together, sometimes with insertion ordeletion of one or more nucleotides at the site of the break. Homologousrecombination (HR) mediated repair (also termed homology-directed repair(HDR)) uses homologous donor DNA as a template to repair the break. Ifthe sequence of the donor DNA differs from the genomic sequence, thisprocess leads to the introduction of sequence changes into the genome.Precise modifications to the genome can be made by providing donor DNAcomprising an appropriate sequence. Modifications that can be generatedusing targetable nucleases include insertions, deletions, orsubstitutions of one or more nucleotides, or introducing an exogenousDNA segment such as an expression cassette (a nucleic acid comprising asequence to be expressed and appropriate expression control elements,such as a promoter, to cause the sequence to be expressed in a cell) ortag at a selected location in the genome.

There are currently four main types of targetable nuclease in use: zincfinger nucleases (ZFNs), transcription activator-like effector nucleases(TALENs), and RNA-guided nucleases (RGNs) such as the Cas proteins ofthe CRISPR/Cas Type II system, and engineered meganucleases. ZFNs andTALENs comprise the nuclease domain of the restriction enzyme FokI (oran engineered variant thereof) fused to a site-specific DNA bindingdomain (DBD) that is appropriately designed to target the protein to aselected DNA sequence. In the case of ZFNs, the DNA binding domaincomprises a zinc finger DBD. In the case of TALENs, the site-specificDBD is designed based on the DNA recognition code employed bytranscription activator-like effectors (TALEs), a family ofsite-specific DNA binding proteins found in plant-pathogenic bacteriasuch as Xanthomonas species. The Clustered Regularly Interspaced ShortPalindromic Repeats (CRISPR) Type II system is a bacterial adaptiveimmune system that has been modified for use as an RNA-guidedendonuclease technology for genome engineering. The bacterial systemcomprises two endogenous bacterial RNAs called crRNA and tracrRNA and aCRISPR-associated (Cas) nuclease, e.g., Cas9. The tracrRNA has partialcomplementarity to the crRNA and forms a complex with it. The Casprotein is guided to the target sequence by the crRNA/tracrRNA complex,which forms a RNA/DNA hybrid between the crRNA sequence and thehomologous sequence in the target. For use in genome modification, thecrRNA and tracrRNA components are often combined into a single chimericguide RNA (sgRNA or gRNA) in which the targeting specificity of thecrRNA and the properties of the tracrRNA are combined into a singletranscript that localizes the Cas protein to the target sequence so thatthe Cas protein can cleave the DNA. The sgRNA often comprises anapproximately 20 nucleotide guide sequence complementary to the desiredtarget sequence followed by about 80 nt of hybrid crRNA/tracrRNA. One ofordinary skill in the art appreciates that the guide RNA need not beperfectly complementary to the target sequence. For example, in someembodiments it may have one or two mismatches. The genomic targetsequence should also be immediately followed by a Protospacer AdjacentMotif (PAM) sequence. The PAM sequence is present in the DNA targetsequence but not in the sgRNA sequence. The Cas protein will be directedto any DNA sequence with the correct target sequence followed by the PAMsequence. The PAM sequence varies depending on the species of bacteriafrom which the Cas protein was derived. In some embodiments, thetargetable nuclease comprises a Cas9 protein. For example, Cas9 fromStreptococcus pyogenes (Sp), Neisseria meningitides, Staphylococcusaureus, Streptococcus thermophiles, or Treponema denticola may be used.The PAM sequences for these Cas9 proteins are NGG, NNNNGATT, NNAGAA,NAAAAC, respectively. A number of engineered variants of thesite-specific nucleases have been developed and may be used in ccrtainembodiments. For example, engineered variants of Cas9 and Fok1 are knownin the art. Furthermore, it will be understood that a biologicallyactive fragment or variant can be used. Other variations include the useof hybrid targetable nucleases. For example, in CRISPR RNA-guided FokInucleases (RFNs) the FokI nuclease domain is fused to the amino-terminalend of a catalytically inactive Cas9 protein (dCas9) protein. RFNs actas dimers and utilize two guide RNAs (Tsai, Q S, et al., Nat Biotechnol.2014; 32(6): 569-576). Site-specific nucleases that produce asingle-stranded DNA break are also of use for genome editing. Suchnucleases, sometimes termed “nickases” can be generated by introducing amutation (e.g., an alanine substitution) at key catalytic residues inone of the two nuclease domains of a targetable nuclease that comprisestwo nuclease domains (such as ZFNs, TALENs, and Cas proteins). Examplesof such mutations include D10A, N863A, and H840A in SpCas9 or athomologous positions in other Cas9 proteins. A nick can stimulate HDR atlow efficiency in some cell types. Two nickases, targeted to a pair ofsequences that are near each other and on opposite strands can create asingle-stranded break on each strand (“double nicking”), effectivelygenerating a DSB, which can be repaired by HDR using a donor DNAtemplate (Ran, F. A. et al. Cell 154, 1380-1389 (2013).

The term “donor nucleic acid” or “donor” refers to an exogenous nucleicacid segment that, when provided to a cell, e.g., along with atargetable nuclease, can be used as a template for DNA repair byhomologous recombination and thereby cause site-specific genomemodification (sometimes termed “genome editing”). The modifications caninclude insertions, deletions, or substitutions of one or morenucleotides, or introducing an exogenous DNA segment such as anexpression cassette or tag at a selected location in the genome. A donornucleic acid typically comprises sequences that have homology to theregion of the genome at which the genomic modification is to be made.The donor may contain one or more single base changes, insertions,deletions, or other alterations with respect to the genomic sequence, solong as it has sufficient homology to allow for homology-directedrepair. In some embodiments a donor nucleic acid may comprise sequences(sometimes termed “homology arms”) flanking a sequence to be introducedinto the genome. The homology arms are homologous to genomic sequencesflanking a location in genomic DNA at which the insertion is to be made.

Donor nucleic acid can be provided, for example, in the form of DNAplasmids, PCR products, or chemically synthesized oligonucleotides, andmay be double-stranded or single-stranded in various embodiments. Thesize of the donor nucleic can vary from as small as about 40 base pairs(bp) to about 10 kilobases (kb), or more. In some embodiments the donornucleic is between about 1 kb and about 5 kb long. In some embodimentsthe homology arms are between about 100 bp-200 bp, about 200 bp-300 bp,about 300 bp-400 bp, about 400 bp-500 bp, about 500 bp-750 bp, about 750bp-1000 bp, about 1 kb-1.5 kb, or more. The two homology arms may beabout the same length (e.g., within 50-100 bp of each other) or maydiffer in length by more than 100 bp. Either or both homology arms couldindependently fall within any of the afore-mentioned ranges. One ofordinary skill in the art appreciates that the homology arms need not beperfectly homologous to the genomic DNA. In some embodiments thehomologous region(s) of a donor nucleic acid have at least 50% 60%, 70%,80%, 90%, 95%, 98%, 99%, or more sequence identity to a genomic sequencewith which homologous recombination is desired. One of ordinary skill inthe art also appreciates that the homology need not extend all the wayto the DNA break. For example, in some embodiments the homology beginsno more than 100 bp away from the break, e.g., between 1 and 100 bpaway, e.g., 1-50 bp away, e.g., 1-15 bp away, from the break.

Those of ordinary skill in the art are aware of methods for performingsite-specific genome modification using targetable nucleases and will beable to apply such methods to introduce a DNA methylation reporter intothe genome at a location of choice or to create other genomicmodifications. Those of ordinary skill in the art can, for example,design appropriate guide RNAs, TALENs, or ZFNs to generate a DNA breakat a selected location in the genome, can design donor nucleic acid(e.g., comprising homology arms) to promote HDR at a DNA break generatedby a targetable nuclease, and are aware of appropriate methods that canbe used to introduce a targetable nuclease into cells and, whereappropriate, a donor nucleic acid, and/or guide RNA. A targetablenuclease may be targeted to a unique site in the genome of a mammaliancell by appropriate design of the nuclease or guide RNA. A nuclease orguide RNA may be introduced into cells by introducing a nucleic acidthat encodes it into the cell. Standard methods such as plasmid DNAtransfection, viral vector delivery, transfection with synthetic mRNA(e.g., capped, polyadenylated mRNA), or microinjection can be used. IfDNA encoding the nuclease or guide RNA is introduced, the codingsequences should be operably linked to appropriate regulatory elementsfor expression, such as a promoter and termination signal. In someembodiments a sequence encoding a guide RNA is operably linked to an RNApolymerase III promoter such as U6 or tRNA promoter. In some embodimentsone or more guide RNAs and Cas protein coding sequences are transcribedfrom the same nucleic acid (e.g., plasmid). In some embodiments multipleguide RNAs are transcribed from the same plasmid or from differentplasmids or are otherwise introduced into the cell. The multiple guideRNAs may direct Cas9 to different target sequences in the genome,allowing for multiplexed genome editing. In some embodiments a nucleaseprotein (e.g., Cas9) may comprise or be modified to comprise a nuclearlocalization signal (e.g., SV40 NLS). A nuclease protein may beintroduced into cells, e.g., using protein transduction. Nucleaseproteins, guide RNAs, or both, may be introduced using microinjection.Methods of using targetable nucleases, e.g., to perform genome editing,are described in numerous publications, such as Methods in Enzymology,Doudna J A, Sontheimer E J. (eds), The use of CRISPR/Cas9, ZFNs, andTALENs in generating site-specific genome alterations. Methods Enzymol.2014, Vol. 546 (Elsevier); Carroll, D., Genome Editing with TargetableNucleases, Annu. Rev. Biochem. 2014. 83:409-39, and references in eitherof these. See also U.S. Pat. Pub. Nos. 20140068797, 20140186919,20140170753 and/or PCT/US2014/034387 (WO/2014/172470).

Accordingly in some aspects, described herein are methods of generatingan engineered cell comprising introducing (a) a nucleic acid comprisingan RGM construct and (b) a targetable nuclease into the cell underconditions suitable for the nucleic acid construct to serve as donornucleic acid to integrate the RGM construct into the genome of the cellin proximity to an ROI. In some embodiments the targetable nuclease is aCas protein and the method comprises introducing a guide RNA that directthe Cas protein to cleave the genome at a desired target location, e.g.,in proximity to an ROI. In some embodiments a targetable nuclease isused to make one or more genetic modifications to the genome of a cellin addition to, or instead of, introducing an RGM construct into thegenome. For example, in some embodiments a targetable nuclease is usedto introduce an additional reporter construct at a site in the genomedistinct from that at which the RGM construct is integrated. Theadditional reporter construct may be any of the additional reporterconstructs described herein. In some embodiments, an additional reporterconstruct comprises a cell type specific regulatory element, e.g., acell type specific promoter, operably linked to a reporter gene. In someembodiments a reporter gene is introduced into the genome such that itis placed in operable association with an endogenous regulatory element(i.e., a regulatory element that is naturally present in the cell and isin its normal position in the genome of the cell) such as an endogenouspromoter. The endogenous regulatory element may be a cell-type specificor cell state specific regulatory element. The reporter molecule encodedby the additional reporter construct may be used to report on the cellidentity or cell state of the cell. For example, in some embodimentsexpression of the reporter molecule indicates that a cell is of acertain type or is in a certain state.

In some embodiments a targetable nuclease is used to make a geneticmodification at any site of interest in the genome of a cell thatcomprises an RGM construct or into which an RGM construct is introduced.For example, a targetable nuclease may be used to generate a mutationthat is associated with a disorder, e.g., in order to create a model ofthe disorder. In some embodiments a targetable nuclease may be used tomutate a DNA or histone modifying enzyme, e.g., so as to reduce orabolish its activity.

In some embodiments multiple genomic modifications at differentlocations are generated together in a cell, e.g., by introducingmultiple sgRNAs (e.g., 2, 3, 4, 5, or more), with or without one or moredonor nucleic acids, into a cell. For example, two or more RGMconstructs may be introduced into the genome in proximity to differentregions of interest or in proximity to the two alleles of a region ofinterest, or an RGM construct and a cell type reporter construct may beintroduced into the genome. Use of CRISPR/Cas systems to drive bothnon-homologous end joining (NHEJ) based gene disruption and homologydirected repair (HDR) based precise gene editing to, among other things,achieve simultaneous targeting of multiple nucleic acid sequences incells and nonhuman mammals is described in PCT/US2014/034387(WO/2014/172470).

Cells or non-human organisms can be analyzed to identify those that havethe desired modification(s) to their genome or confirm that a desiredmodification has occurred. Suitable methods for performing such analysisinclude restriction analysis, Southern blot, PCR analysis, orsequencing.

Cells

In some aspects, a cell comprising a DNA methylation reporter describedherein is disclosed. In some embodiments, the cell comprises a nucleicacid construct or vector comprising a DNA methylation reporter describedherein. In some embodiments a DNA methylation reporter is integratedinto the genome of the cell. A DNA methylation reporter may beintegrated in proximity to any region of DNA and used to evaluate themethylation state of the region.

In some embodiments the cell is a eukaryotic cell. In some embodiments,the cell is a vertebrate cell. In some embodiments, the cell is amammalian cell. In some embodiments the mammalian cell is a eutherianmammalian cell. In some embodiments the mammalian cell is a human cell,a non-human primate cell, a rodent cell (e.g., a mouse, rat, hamster, orguinea pig cell), or rabbit cell. In some embodiments the mammalian cellis a bovine, ovine, caprine, equine, porcine, canine, or feline cell.

In some embodiments the cell is a stem cell. In some embodiments thecell is a pluripotent cell. A pluripotent cell may be an embryonic stem(ES) cell or an induced pluripotent stem (iPS) cell. In some embodimentsthe cell is a somatic cell. Somatic cells of interest herein aretypically mammalian cells, such as, for example, human cells, primatecells, or rodent cells, e.g., mouse cells. They may be obtained bywell-known methods and can be obtained from any organ or tissuecontaining live somatic cells, e.g., blood, bone marrow, skin, lung,pancreas, liver, stomach, intestine, heart, reproductive organs,bladder, kidney, urethra and other urinary organs, etc. Mammaliansomatic cells include, but are not limited to, adipocyte (e.g., whitefat cell or brown fat cell), cardiac myocyte, chondrocyte, endothelialcell, epidermal cells, epithelial cells, exocrine gland cell,fibroblast, glial cell, hematopoietic cells, hepatocyte, hair folliclecells, keratinocyte, macrophage, melanocyte, monocyte, mononuclear cell,myeloid cell, neuron, neutrophil, osteoblast, osteoclast, pancreaticislet cell (e.g., a beta cell), Sertoli cell, skeletal myocyte, smoothmuscle cell, B cell, plasma cell, T cell (e.g., regulatory, cytotoxic,helper), or dendritic cell. The term “somatic cells”, as used herein,also includes adult stem cells. An adult stem cell is a cell that iscapable of giving rise to all cell types of a particular tissue.Exemplary adult stem cells include hematopoietic stem cells, neural stemcells, and mesenchymal stem cells. In some embodiments the cell is anadult stem cell, e.g., a hematopoietic stem cell, neural stem cell,intestinal stem cell, stem cell, or mammary stem cell.

Differentiation is the process by which a less specialized cell becomesa more specialized cell type. Differentiation often occurs in stages inwhich cells become more specified over a series of cell divisions untilthey reach full maturity, which may be referred to as “terminaldifferentiation”. A somatic cell may be partially or completelydifferentiated. Cell differentiation can involve changes in the size,shape, polarity, metabolic activity, gene expression and/orresponsiveness to signals of the cell. For example, hematopoietic stemcells differentiate to give rise to all the blood cell types includingthose of the myeloid lineage (monocytes and macrophages, neutrophils,basophils, eosinophils, erythrocytes, megakaryocytes/platelets,dendritic cells) and lymphoid lineage (T-cells, B-cells, NK-cells).During progression along the path of differentiation, thedifferentiation potential of a cell (the range of cells into which acell can develop) typically becomes more restricted.

In some embodiments a cell is a progenitor cell. As used herein, a“progenitor cell” is a cell that has a more restricted differentiationpotential than an adult stem cell or pluripotent cell but can bothself-renew and give rise to daughter cells that are more differentiatedthan itself. In some embodiments the cell is a terminally differentiatedcell, meaning that the cell normally lacks the capacity to give rise tocells that are more differentiated than itself.

In some embodiments a cell is a germline cell, also referred to as agerm cell. Germ line cells are any line of cells that give rise togametes (eggs and sperm). In many animals, including mammals, the germcells originate in the primitive streak and migrate to the developinggonads. There, they undergo cell division of two types, mitosis andmeiosis, followed by differentiation into mature gametes, either eggs orsperm. Germ cells include primordial germ cells (PGCs), gametogonia, andgametocytes. In some embodiments a cell is a gamete. In some embodimentsa cell is a zygote or a cell in or obtained from an embryo having nomore than 2, 4, 8, 16, 32, or 64 cells.

In some embodiments the cell is a normal cell. In some embodiments thecell is an abnormal cell. An abnormal cell may have a defect in one ormore biological processes and/or may exhibit one or more phenotypes thatare distinct from those found in a normal matched cell. In someembodiments the cell harbors a mutation or genetic variation that isassociated with a disorder. In some embodiments a mutation or geneticvariation associated with a disorder is one that occurs more frequentlyin individuals who have the disorder than in individuals who do not havethe disorder. The mutation or genetic variation may be recognized in theart as causing or contributing to the disorder. In some embodiments thecell may be genetically engineered to harbor such a mutation or geneticvariation.

In some embodiments the cell is a diseased cell. A diseased cell is onethat exhibits one or more manifestations of a disorder. For example, insome embodiments the cell is a cancer cell. A cancer cell may be derivedfrom any type of cancer. “Cancer” as used herein, encompasses any typeof cancer, including solid tumors (e.g., carcinomas, sarcomas), andhematologic malignancies. Solid tumors include, e.g., bladder, bone,brain (e.g., glioblastoma), breast, cervical, colon, endometrial,esophageal, gastric, liver (e.g., hepatocellular carcinoma), lung,ovarian, pancreatic, prostate, renal, skin, testicular, and thyroidcancer. Others include melanoma, retinoblastoma, and neuroblastoma.Hematological malignancies include, e.g., leukemias, lymphomas (also asolid tumor), and myeloma. In some embodiments a lymphoma is a B celllymphoma, T cell lymphoma, Burkitt's lymphoma, Hodgkin lymphoma, mantlecell lymphoma, NK cell lymphoma, diffuse large cell lymphoma. In someembodiments a tumor is a gastrointestinal stromal tumor, e.g., asuccinate dehydrogenase (SDH)-deficient gastrointestinal stromal tumor.In some embodiments a tumor is Wilm's tumor. In some embodiments a tumoris part of a multitumor syndrome, e.g., Carney triad (paragangliomas,gastric stromal tumours and pulmonary chondromas), or the dyad ofparagangliomas and gastric stromal sarcomas (Carney-Stratakis syndrome).In some embodiments the disorder is a precancerous condition (i.e., acondition that can evolve into a cancer) such as myelodysplasticsyndrome. In some embodiments the cancer cell is experimentallygenerated by expressing one or more oncogenes and/or inhibitingexpression of one or more tumor suppressor genes in the cell. In someembodiments the cell is a cancer stem cell. In some embodiments a cellis obtained from a subject suffering from a disorder. A cell obtainedfrom a subject suffering from a disorder could be the originallyisolated cell or a descendant of the cell arising in cell culture afterisolation of the cell. In some embodiments the disorder is cancer. Insome embodiments the disorder is an autoimmune disorder. In someembodiments the disorder is a neurodegenerative disorder. In someembodiments the disorder is a psychiatric disorder.

A cell may be in a living animal, e.g., a mammal, or may be an isolatedcell. Isolated cells may be primary cells, such as those recentlyisolated from an animal (e.g., cells that have undergone none or only afew population doublings and/or passages following isolation, e.g., upto 3-5, or up to 5-10 doublings or passages), or may be a cell of a cellline that is capable of prolonged proliferation in culture (e.g., forlonger than 3 months) or indefinite proliferation in culture(immortalized cells). In some embodiments, a cell is a somatic cell.Somatic cells may be obtained from an individual, e.g., a mouse, human,or other mammal, and cultured according to standard cell cultureprotocols known to those of ordinary skill in the art. Cells may beobtained from any organ or tissue of interest. In some embodiments,cells are obtained from bladder, blood, blood vessel (e.g., artery orvein), breast, endocrine gland, brain, eye, exocrine gland, fat,gastrointestinal tract (e.g., stomach, small intestine, colon), heart,kidney, liver, lung, muscle, ovary, prostate gland, skin, testis, orurethra. Cells may be maintained in cell culture following theirisolation. In certain embodiments, the cells are passaged or allowed todouble once or more following their isolation from an individual (e.g.,between 2-5, 5-10, 10-20, 20-50, 50-100 times, or more) prior to theiruse in a method described herein. In some embodiments, cells may befrozen and subsequently thawed prior to use. Cells may be frozen in asuitable medium (e.g., containing a cryopreservative) to help maintainviability. In some embodiments, the cells will have been passaged orpermitted to double no more than 1, 2, 5, 10, 20, or 50 times followingtheir isolation from the individual prior to their use in a methoddescribed herein. Cells may be genetically modified or not geneticallymodified in various embodiments. Cells may be obtained from normal ordiseased tissue in various embodiments.

In some aspects, described herein are populations of cells, e.g.,isolated cells, that comprise a nucleic acid comprising an RGM constructintegrated into their genome. In some embodiments a population ofisolated cells in any embodiment may be composed mainly or essentiallyentirely of cells of a particular cell type or of cells in a particularcell state. A population of isolated cells in any embodiment mayadditionally or alternately be composed mainly or essentially entirelyof cells that have a particular genetic modification or combinationthereof. In some embodiments, an isolated population of cells consistsof at least 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99%,99.5%, 99.9%, or 100% cells of a particular cell type or cell state(i.e., the population is at least 30%, 40%, 50%, 60%, 70%, 80%, 90%,95%, 96%, 97%, 98%, 99%, or 100% pure), e.g., as determined byexpression of one or more markers or by any other suitable method. Insome embodiments, an isolated population of cells consists of at least30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99%, 99.5%,99.9%, or 100% cells that have a particular genetic modification orcombination thereof. A population of cells could be derived from asingle cell or from multiple cells. In some embodiments a population ofcells is derived from a single cell that has one or more particulargenetic modification(s). For example, the cell may have a DNAmethylation reporter integrated into its genome at a particularlocation. The cell may also have any one or more of the other geneticmodifications described herein.

In some embodiments a population of cells in any embodiment comprisescells of multiple different cell types or of cells in multiple differentcell states. For example, the population may comprise cells of at least2, 3, 4, 5, or more different cell types, cell states, or combinationsthereof. In some embodiments, an RGM construct is useful forunderstanding changes in DNA methylation that occur during cell statetransition in heterogeneous cell populations. In some embodiments, anRGM construct is useful for understanding changes in DNA methylationthat occur in particular cell types in heterogeneous cell populations.In some embodiments cells of a particular cell type or cell state ofinterest may be identified by their expression of cell type or cellstate specific marker(s) or reporter gene(s) under control of cell typeor cell state specific promoters. Since RGM constructs in certainembodiments allow measuring dynamics of DNA methylation at single-cellresolution, methylation changes occurring in particular cells ofinterest can be detected and distinguished from those occurring in theoverall population.

In some embodiments a cell is a member of a cell line. Cell lines can begenerated using methods known in the art or obtained, e.g., fromdepositories or cell banks such as the American Type Culture Collection(ATCC), Coriell Cell Repositories, Deutsche Sammlung von Mikroorganismenund Zellkulturen (German Collection of Microorganisms and Cell Cultures;DSMZ), European Collection of Cell Cultures (ECACC), Japanese Collectionof Research Bioresources (JCRB), RIKEN, Cell Bank Australia, etc. Thepaper and online catalogs of the afore-mentioned depositories and cellbanks are incorporated herein by reference.

Cells or cell lines may be of any cell type or tissue of origin invarious embodiments. In some embodiments the cell is an adipocyte (e.g.,white fat cell or brown fat cell), cardiomyocyte, chondrocyte,epithelial cell, endothelial cell, endocrine gland cell, exocrine glandcell, fibroblast, glial cell (e.g., astrocyte, oligodendrocyte,microglial cell, Schwann cell), hepatocyte, keratinocyte, melanocyte,mesenchymal cell, neuron, osteoblast, osteoclast, pancreatic islet cell(e.g., a beta cell, alpha cell), skeletal myocyte, smooth muscle cell.In some embodiments the cell is an immune cell, e.g., a B cell, plasmacell, T cell (e.g., cytotoxic, helper, regulatory, killer), dendriticcell, natural killer cell, macrophage, monocyte, neutrophil, eosinophil,basophil, or mast cell. In some embodiments a neuron is of a type thatis normally found in the central nervous system (CNS), e.g., the brain.In some embodiments a neuron is of a type normally found in theperipheral nervous system (PNS). Neurons can be classified according tomorphology, type(s) of neurotransmitter(s) that they produce or to whichthey respond, and/or region of the CNS or PNS in which they are normallyfound. In some embodiments a cell is a neuron that produces or respondsto a particular neurotransmitter of interest. Neurotransmitters include,e.g., acetylcholine, dopamine, epinephrine, gamma-aminobutyric acid,glutamate, glycine, and serotonin. In some embodiments, an enzyme thatacts in a biosynthetic pathways leading to production of a givenneurotransmitter, or a receptor that binds to or takes up suchneurotransmitter may be a cell type specific marker for the particularsubtype of neuron that produces or responds to the neurotransmitter,respectively.

In some embodiments the cell has one or more genetic modifications inaddition to the insertion of an RGM construct into its genome. In someembodiments the cell has a mutation or other genetic variation (e.g., apolymorphism) in a gene encoding a protein or RNA of interest. Themutation or genetic variation may be naturally occurring or engineered.In some embodiments the mutation or genetic variation is adisease-associated mutation, e.g., a mutation or genetic variation thatcauses a disorder or is associated with an increased risk of developinga disease. In some embodiments the disorder is a disorder associatedwith aberrant DNA methylation.

In some embodiments the cell is engineered to have increased ordecreased expression or activity of a protein or RNA of interest ascompared with a non-engineered control cell. The protein(s) or RNA(s) ofinterest can be any protein(s), or RNA(s) of interest. In someembodiments the protein is a chromatin modifying enzyme, e.g., a DNAmodifying enzyme or histone modifying enzyme. For example, in someembodiments the protein is a DNA methyltransferase, a DNA demethylase, aDNA glycosylase, a histone methyltransferase, a histone demethylase, ahistone acetylase, or a histone deacetylase. In some embodiments theprotein is a DNA repair enzyme.

A cell may be engineered to have increased or decreased expression oractivity of a protein or RNA of interest using any of a variety ofmethods known in the art. In some embodiments a cell may be engineeredto have increased expression or activity of a protein or RNA of interestby introducing into the cell an expression construct that encodes theprotein or RNA into the cell. The protein or RNA may be one that isnaturally found in the cell or may be one that is not naturally found inthe cell in various embodiments. In some embodiments a cell may beengineered to have increased expression or activity of an endogenousprotein or RNA of interest by introducing into the cell an artificialtranscriptional regulator designed to increase activity of theendogenous promoter that naturally directs transcription of suchendogenous protein or RNA. In some embodiments a cell may be engineeredto have decreased expression or activity of a protein or RNA of interestby introducing a mutation or deletion into the gene that encodes the RNAor protein (e.g., using a targetable nuclease), by expressing an RNAiagent such as a short hairpin RNA or artificial microRNA in the cell, orby expressing an antisense RNA in the cell. In some embodiments a cellmay be engineered to have increased expression or activity of anendogenous protein or RNA of interest by introducing into the cell anartificial transcriptional regulator designed to decrease activity ofthe endogenous promoter that directs transcription of such endogenousprotein or RNA. In some embodiments the protein is a transcriptionfactor. Those of ordinary skill in the art are aware of the numerousmammalian transcription factors. In some embodiments the transcriptionfactor is included in the TRANSFAC® or JASPAR database. In someembodiments the transcription factor is a master transcription factor.

In some embodiments, a RNA or protein to be expressed in a cell is underthe control of a regulatable (inducible or repressible) promoter. One ofordinary skill in the art appreciates that various regulatable promotersystems are available. For example, the tetracycline-regulatable geneexpression system or variants thereof (see, e.g., Gossen & Bujard, Proc.Natl. Acad. Sci. 89:5547-5551, 1992; Allen, N, et al. (2000) MouseGenetics and Transgenics: 259-263; Urlinger, S, et al. (2000). Proc.Natl. Acad. Sci. U.S.A. 97 (14): 7963-8; Zhou, X., et al (2006). GeneTher. 13 (19): 1382-1390; Schonig, K., et al., Methods Enzymol. 2010;477:429-53) can be employed to provide inducible or repressibleexpression. One of ordinary skill appreciates that small molecules suchas tetracycline, doxycycline, 4-Epidoxycycline, steroid hormones, andthe like, can be used. In some embodiments a protein's activity may beregulatable using a small molecule. For example, Cre may be fused to asteroid hormone ligand binding domain so that its activity is regulatedby receptor ligands. Cre-ER(T) or Cre-ER(T2) recombinases may be used,which comprise a fusion protein between a mutated ligand binding domainof the human estrogen receptor (ER) and the Cre recombinase, theactivity of which can be induced by, e.g., 4-hydroxy-tamoxifen. In someembodiments, such systems may be used to control expression of anyendogenous gene or exogenously introduced nucleic acid in a tissuespecific, temporally defined, and/or reversible manner.

Non-Human Mammals

In some aspects, described herein are non-human mammals comprising atleast one cell that comprises a nucleic acid comprising an RGM constructintegrated into its genome. In some embodiments the nucleic acid isintegrated in proximity to a region of interest. The region of interestmay be any of the regions of interest described herein. For example, insome embodiments the region of interest is a superenhancer, enhancer,promoter, DMR, disease-specific DMR, tissue-specific DMR, CpG island, orgene body. In some embodiments the RGM construct is integrated with 10kb, 20 kb, or 50 kb of a TSS.

In some embodiments, the non-human animal is a chimeric animal. In someembodiments between about 5% and about 95% of the animal's cells havethe nucleic acid construct integrated into their genome. In someembodiments at least some germline cells of the animal harbor thenucleic acid in their genome. In some embodiments all or substantiallyall (e.g., at least 99%, 99.5%, 99.9%) of the cells of the non-humanmammal have the nucleic acid integrated into their genome. In someembodiments cells of the non-human mammal have a single copy of thenucleic acid comprising an RGM construct integrated into their genome inproximity to a region of interest. In some embodiments the region ofinterest is in an autosome, and cells of the non-human mammal comprise anucleic acid comprising an RGM construct integrated into their genome inproximity to both copies of the region of interest (i.e., in proximityto each allele of the region of interest).

The nonhuman mammals can be produced using any of a variety of methodsfor producing genetically modified non-human mammals known in the art.In some embodiments, a method of use to produce nonhuman mammalsincludes pronuclear microinjection. DNA is introduced directly into apronucleus of a nonhuman mammal egg just after fertilization (e.g., bymicroinjection or piezoinjection). The egg is implanted into anappropriate foster mother, e.g., a pseudopregnant female of the samespecies (e.g., into the oviduct of such female). The female is thenmaintained under conditions that result in development of live offspringthat harbor the one or more genetic modifications. Offspring arescreened for the integrated DNA. Heterozygous offspring can besubsequently mated to generate homozygous animals. In the context of thepresent disclosure, the DNA which is introduced into the pronucleus of anon-human mammalian egg is a nucleic acid comprising an RGM construct.In some embodiments the nucleic acid comprises homology arms asdescribed above, in order to promote homologous recombination tointroduce the RGM construct into the genome in proximity to a region ofinterest.

In some embodiments, non-human mammals are generated from pluripotentcells, e.g., ES or iPS cells, using conventional methods. See, e.g.,U.S. Patent Application Pub. No. 20110076678 for examples of generatingnon-human mammals from iPS cells. Such methods may be used to generatenon-human mammals from ES cells. The ES or iPS cell used to derive anon-human mammal has a nucleic acid comprising an RGM constructintegrated into its genome.

In some embodiments a technique useful for generating non-human mammalsof the present disclosure involves introducing one or more ES and/or iPScells comprising one or more genetic modifications into a diploidblastocyst and maintaining the blastocyst under conditions that resultin development of an embryo. The embryo is then transferred into anappropriate foster mother, such as a pseudopregnant female (e.g., of thesame species as the embryo). The foster mother is then maintained underconditions that result in development of live chimeric offspring thatharbor the one or more genetic modifications in some of their cells. Inthe context of the present disclosure, the ES and/or iPS cells have anRGM construct integrated into their genome, e.g., in proximity to aregion of interest. Chimeric animals in which the ES and/or iPS cellshave contributed to the germline (i.e., the germ line of the chimericanimal contains cells derived from the introduced cell ES or iPS cells)can be bred to generate animals that have the genetic modification inall or substantially all of their cells.

In some embodiments a method of producing a non-human mammal comprisesinjecting non-human mammalian ES cells or iPSCs that are geneticallymodified to harbor an RGM construct integrated into their genome into anon-human tetraploid blastocyst, transferring the blastocyst into anappropriate foster mother, e.g., a pseudopregnant female of the samespecies, and allowing the blastocyst to develop. The resulting non-humanmammal is derived from the ES cells or iPSCs cells and thus harbors theRGM construct in its cells. In some embodiments, said non-humanmammalian ES cells or iPSCs cells are mouse cells and said non-humanmammalian embryo is a mouse. In some embodiments, said mouse cells areinjected into said non-human tetraploid blastocysts by microinjection.In some embodiments laser-assisted micromanipulation or piezoinjectionis used.

In some embodiments a non-human mammal comprising cells that comprise anucleic acid comprising an RGM construct in their genome is generatedfrom a zygote (a one cell embryo) comprising a targetable nuclease andan RGM construct, wherein the targetable nuclease cleaves genomic DNA ata target site in the region of interest, promoting integration of thenucleic acid by homology directed repair. For example, in someembodiments the nonhuman animal is generated from a zygote comprising aguide RNA, Cas9 protein, and a nucleic acid comprising an RGM construct,wherein the guide RNA guides the Cas9 protein to cleave the genomic DNAof the zygote at a target site in the region of interest. The guide RNA,Cas9 protein, and nucleic acid may be introduced into the zygote using avariety of methods. In some embodiments Cas9 mRNA, sgRNA, and nucleicacid comprising an RGM construct are introduced into the zygote, e.g.,by injection. In some embodiments Cas9 protein, sgRNA, and nucleic acidcomprising an RGM construct are introduced into the zygote, e.g., byinjection. The zygote may be cultured in vitro, e.g., to the blastocyststage, and transferred into a foster nonhuman mammalian mother. Thefoster nonhuman mammalian mother is maintained under conditions suitablefor production of one or more offspring harboring the nucleic acidcomprising an RGM construct in their genome, thereby producing anonhuman mammal comprising an RGM construct in its genome.

In some embodiments a nucleic acid comprising an RGM construct may beintroduced into an embryo, fetus, post-natal, juvenile, or adultnon-human mammal. In some embodiments the nucleic acid may be injectedinto an organ or tissue such as the heart, brain, liver, etc. Thenucleic acid may be taken up by some of the animal's cells and integrateinto the genome.

In some embodiments the non-human mammal is of any mouse strain known inthe art. Examples include C57BL/6J, 129S1/SvImJ, A/J, AKR/J, BALB/cByJ,BTBR T+tf/J, C3H/HeJ, CAST/EiJ, DBA/2J, FVB/NJ, MOLF/EiJ, KK/HIJ,NOD/ShiLtJ, NZW/LacJ, PWD/PhJ, and WSB/EiJ, CD-1, CBA, ICR, or Balb/C.In some aspects, various mouse strains and mouse models of human diseaseare used in conjunction with the methods of producing a nonhuman mammalcomprising an RGM construct integrated into its genome described herein.One of ordinary skill in the art appreciates the thousands ofcommercially and non-commercially available strains of laboratory micethat have specific genetic modifications (e.g., transgenes, knockouts,tissue or cell type specific Cre recombinase lines, Tet transactivatorlines, Tet responder lines), which may be constitutive or conditional(e.g., inducible). One of ordinary skill in the art also appreciates thethousands of commercially and non-commercially available strains oflaboratory mice for modeling human disease. For example, numerous mousestrains harboring particular genetic modifications and/or useful asmodels for human disease are available from Jackson Laboratories (BarHarbor, Me.) (JAX® mice), RIKEN, EMMA, Taconic Biosciences (Hudson,N.Y.), and other sources. Mice models exist for diseases such as cancer,cardiovascular disease, autoimmune diseases and disorders, inflammatorydiseases, diabetes, neurological diseases (including neurodegenerativedisease and neurodevelopmental diseases), psychiatric diseases,endocrine deficiency, hearing loss), hematological disease,inflammation, musculoskeletal disorders, metabolic disease, visionloss), cardiovascular disease, and other diseases. In some aspects, amethod of producing a nonhuman mammal comprising an RGM construct in itsgenome further comprises mating one or more commercially and/ornon-commercially available nonhuman mammal with the nonhuman mammalcomprising an RGM construct in its genome produced by the methodsdescribed herein. In some aspects, nonhuman mammals produced by themethods described herein are provided.

In some embodiments, methylation state of a region of interest may bedetected or monitored in vivo in a non-human mammal (e.g., a mouse)comprising an RGM construct integrated into its genome in proximity tothe region of interest. Suitable methods for performing in vivo imagingare known in the art.

In some aspects, the present disclosure provides isolated cells obtainedfrom any of the non-human mammals described herein, wherein the cellscomprise a nucleic acid comprising an RGM construct integrated intotheir genome. The cells may be obtained from any organ or tissue of theanimal and may be of any cell type (see discussion of various tissues,organs, and cell types above). It should be understood that cells“obtained” from a subject such as a non-human animal include the cellsoriginally removed from the animal as well as progeny of those cells. Insome embodiments DNA methylation of a region of interest may be detectedor monitored in the cells using the RGM construct, as described herein.

In some aspects, the present disclosure provides tissue or organ samplesobtained from any of the non-human mammals described herein, whereincells in the tissue or organ sample comprise a nucleic acid comprisingan RGM construct integrated into their genome. The tissue or organsample may be obtained from any organ or tissue of the animal. In someembodiments DNA methylation of a region of interest may be detected ormonitored in cells in the sample using the RGM construct, as describedherein.

In some embodiments two or more biological samples comprising cells maybe obtained from a non-human mammal. The methylation state of a regionof interest at a first time point is compared with the methylation stateof the same region of interest at one or more subsequent time points.The samples may be obtained from the same tissue or organ (e.g., bloodcells, skin cells).

In some embodiments animals generated according to methods describedherein may be useful in the identification of candidate agents fortreatment of disease and/or for testing agents for potential toxicity orside effects, such as those potentially arising from aberrantmethylation of a region of interest. In some embodiments any methoddescribed herein may comprise contacting an animal generated accordingto methods described herein with a test agent (e.g., a small molecule,nucleic acid, polypeptide, lipid, etc.).

Kits

The disclosure further provides packaged products and kits, including aconstruct or composition described herein, packaged into suitablepackaging material. The term “packaging material” refers to a physicalstructure housing the product or components of the kit. The packagingmaterial can maintain the components sterilely, and can be made ofmaterial commonly used for such purposes (e.g., paper, corrugated fiber,glass, plastic, foil, ampules, etc.).

In certain embodiments, a packaged product or kit includes a container,such as a sealed pouch or shipping container, or an article ofmanufacture, for example, to carry out an assay described herein, suchas a tissue culture dish, tube, flask, roller bottle or plate (e.g., asingle multi-well plate or dish such as an 8, 16, 32, 64, 96, 384 and1536 multi-well plate or dish).

A label or packaging insert can be included, listing contents orappropriate written instructions, for example, practicing a method ofthe disclosure. Instructions may be on “printed matter,” e.g., on paperor cardboard within the kit, on a label affixed to the package, kit orpackaging material, or attached to a tissue culture dish, tube, flask,roller bottle, plate (e.g., a single multi-well plate or dish such as an8, 16, 32, 64, 96, 384 and 1536 multi-well plate or dish) or vialcontaining a component of the kit. Instructions may comprise voice orvideo tape and additionally be included on a computer readable medium,such as a disk (floppy diskette or hard disk), optical CD such as CD- orDVD-ROM/RAM, magnetic tape, electrical storage media such as RAM and ROMand hybrids of these such as magnetic/optical storage media.

Disclosed kits can optionally include additional components, such asbuffering agent, a preservative, or a reagent. Each component of the kitcan be enclosed within an individual container or in a mixture and allof the various containers can be within single or multiple packages.

In some aspects, the present disclosure provides kits containing any oneor more of the RGM constructs, nucleic acids, and/or vectors describedherein. A kit may comprise 2, 3, 4, or more RGM constructs, nucleicacids, and/or vectors, at least some of which may comprise differentreporter genes. In some embodiments a kit may comprise a transfectionreagent, DNA modifying enzyme, a buffer solution, a cell. In someembodiments a kit may comprise instructions for use of the kit to detector monitor DNA methylation in cells.

Applications

DNA methylation reporter constructs described herein have a number ofdifferent uses and may be used in a wide variety of methods. Thissection describes certain non-limiting applications and methods of use.In general, any of the RGM constructs described herein and/or any of themammalian cells described herein may be used in various embodiments,unless otherwise indicated or unless the context clearly dictatesotherwise. Without limiting the foregoing, in some embodiments mousecells or human cells are used; in some embodiments an RGM constructencoding a fluorescent protein or a luciferase as a reporter moleculemay be used.

In some embodiments, a DNA methylation reporter is used to detectmethylation state (or change in methylation state) of a genomic regionof interest during a cell identity transition. A “cell identitytransition” is a change from one cell type to another cell type. In someembodiments, a DNA methylation reporter described herein is used todetect methylation state of a genomic region of interest during a cellstate transition. In some embodiments a cell state transition is atransition from one state of differentiation to a second state ofdifferentiation within a particular cell lineage. In some embodiments acell state transition is a change from a pluripotent state to anon-pluripotent state. In some embodiments a cell state transition is achange from a non-pluripotent state (e.g., a unipotent or multipotentstate) to a pluripotent state. In some embodiments a cell statetransition is a change from a pluripotent state to a multipotent state.In some embodiments a cell state transition is a change from amultipotent state to a unipotent state. In some embodiments a cell statetransition is a change from a terminally differentiated state to amultipotent or pluripotent state. In some embodiments a cell statetransition is a change from a post-mitotic state to an actively dividingstate. In some embodiments, a DNA methylation reporter described hereinis used to detect methylation state of a genomic region of interest oneor more times prior to the beginning of a cell identity transition orcell state transition and one or more times after a cell identitytransition or cell state transition has started and/or one or more timesafter a cell identity transition or cell state transition has occurred.A methylation reporter may thus be used to detect a change inmethylation state that occurs during or temporally correlated with acell identity transition or cell state transition. DNA methylationduring or correlated with any type of cell identity transition or cellstate transition can be detected in various embodiments. In someembodiments the cell identity transition or cell state transition occursin an isolated cell, e.g., a cell in cell culture. In some embodimentsthe cell identity transition or cell state transition occurs in vivo,i.e., within a living animal.

In some aspects, cell state reflects the fact that cells of a particulartype can exhibit variability with regard to one or more features and/orcan exist in a variety of different conditions, while retaining thefeatures of their particular cell type and not gaining features thatwould cause them to be classified as a different cell type. Thedifferent states or conditions in which a cell can exist may becharacteristic of a particular cell type (e.g., they may involveproperties or characteristics exhibited only by that cell type and/orinvolve functions performed only or primarily by that cell type) or mayoccur in multiple different cell types. Sometimes a cell state reflectsthe capability of a cell to respond to a particular stimulus orenvironmental condition (e.g., whether or not the cell will respond, orthe type of response that will be elicited) or is a condition of thecell brought about by a stimulus or environmental condition. Cells indifferent cell states may be distinguished from one another in a varietyof ways. For example, they may express, produce, or secrete one or moredifferent genes, RNAs, proteins, or other molecules, exhibit differencesin protein modifications such as phosphorylation, acetylation, etc., ormay exhibit differences in appearance. Thus a cell state may be acondition of the cell in which the cell expresses, produces, or secretesone or more markers, exhibits particular protein modification(s), has aparticular appearance, and/or will or will not exhibit one or morebiological response(s) to a stimulus or environmental condition. Markerscan be assessed using methods well known in the art, e.g., geneexpression can be assessed at the mRNA level using Northern blots, cDNAor oligonucleotide microarrays, or sequencing (e.g., RNA-Seq), or at thelevel of protein expression using protein microarrays, Western blots,flow cytometry, immunohistochemistry, etc. Modifications can beassessed, e.g., using antibodies that are specific for a particularmodified form of a protein, e.g., phospho-specific antibodies, or massspectrometry.

Another example of cell state is “activated” state as compared with“resting” or “non-activated” state. Many cell types in the body have thecapacity to respond to a stimulus by modifying their state to anactivated state. The particular alterations in state may differdepending on the cell type and/or the particular stimulus. A stimuluscould be any biological, chemical, or physical agent to which a cell maybe exposed. A stimulus could originate outside an organism (e.g., apathogen such as virus, bacteria, or fungi (or a component or productthereof such as a protein, carbohydrate, or nucleic acid, cell wallconstituent such as bacterial lipopolysaccharide, etc.) or may beinternally generated (e.g., a cytokine, chemokine, growth factor, orhormone produced by other cells in the body or by the cell itself). Forexample, stimuli can include interleukins, interferons, or TNF alpha.Immune system cells, for example, can become activated upon encounteringforeign (or in some instances host cell) molecules. Cells of theadaptive immune system can become activated upon encountering a cognateantigen (e.g., containing an epitope specifically recognized by thecell's T cell or B cell receptor) and, optionally, appropriateco-stimulating signals. Activation can result in changes in geneexpression, production and/or secretion of molecules (e.g., cytokines,inflammatory mediators), and a variety of other changes that, forexample, aid in defense against pathogens but can, e.g., if excessive,prolonged, or directed against host cells or host cell molecules,contribute to diseases.

Fibroblasts are another cell type that can become activated in responseto a variety of stimuli (e.g., injury (e.g., trauma, surgery), exposureto certain compounds including a variety of pharmacological agents,radiation, etc.) leading them, for example, to secrete extracellularmatrix components. In the case of response to injury, such ECMcomponents can contribute to wound healing. However, fibroblastactivation, e.g., if prolonged, inappropriate, or excessive, can lead toa range of fibrotic conditions affecting diverse tissues and organs(e.g., heart, kidney, liver, intestine, blood vessels, skin) and/orcontribute to cancer.

Another example of cell state reflects the condition of a cell as eitherresponsive (sensitive) or non-responsive (resistant) to a particularstimulus (e.g., a particular substance with which the cell is contacted,such as a hormone, growth factor, chemokine, therapeutic agent). Forexample, insulin-resistant skeletal muscle cells exhibit markedlyreduced insulin-stimulated glucose uptake and a variety of othermetabolic abnormalities that distinguish these cells from cells withnormal insulin sensitivity. In some aspects, an RGM construct may beused to detect or monitor methylation changes that accompany any changein cell state.

In some embodiments a cell comprising a nucleic acid comprising an RGMconstruct integrated into its genome is exposed to an agent or conditionthat induces the cell to undergo a cell state transition or cellidentity transition. For example, the cell may be subjected to areprogramming protocol. A “reprogramming protocol” refers to anytreatment or combination of treatments that causes at least some cellssubjected to it to become reprogrammed. In some embodiments a“reprogramming protocol” refers to a set of manipulations (e.g.,introduction of nucleic acid(s), e.g., vector(s), carrying particulargenes) and/or culture conditions (e.g., culture in medium containingparticular compounds) in vitro that generates pluripotent cells fromsomatic cells, or that generates a first differentiated cell type from afirst differentiated cell type without going through a pluripotentintermediate state. The transcription factors, small molecules, or otheragents that mediate reprogramming may be referred to as reprogrammingagents. In some embodiments, a DNA methylation reporter is used todetect methylation state (or change in methylation state) of a genomicregion of interest during natural or experimentally induceddifferentiation. Cells may be exposed to agents or conditions that canpromote differentiation, such as retinoids (e.g., retinoic acid),various growth factors, and/or may be subjected to withdrawal of one ormore agents that promote maintenance of a particular state and therebyblocked differentiation.

In some embodiments, a DNA methylation reporter described herein is usedto detect the effect of an agent or condition or combination thereof onthe methylation state of a genomic region of interest. The agent may ormay not affect the identity or state of the cell in various embodiments.In some embodiment a method of evaluating the effect of an agent on themethylation state of a DNA region of interest in a cell comprises stepsof: contacting one or more cells comprising (i) a mammalian imprintedgene promoter; and (ii) a sequence that encodes a reporter molecule witha test agent; measuring expression of the reporter molecule; andcomparing the level of expression of the reporter molecule with acontrol value, wherein a difference between the measured value and thecontrol value indicates that the test agent modulates the methylationstate of the region of interest. In general, any of a wide variety ofagents can be evaluated. In some embodiments, the agent is a smallmolecule, polypeptide, nucleic acid, lipid, or sugar. In someembodiments a library of compounds may be tested, e.g., a small moleculelibrary, natural product library, peptide library. In some embodimentsthe agent is a nucleic acid that is introduced into the cell. Forexample, the nucleic acid may comprise a siRNA. In some embodiments theagent is expressed in the cell. In some embodiments a high throughputscreen is performed, in which at least about 20,000 agents (e.g., smallmolecule compounds or nucleic acids) are tested. Cells may be placed inindividual wells of a microtiter plate with different compounds. Agentsthat increase or inhibit methylation or demethylation of one or moreROIs may be identified.

In some aspects of any screening and/or characterization methods, agentsmay be contacted with cells comprising an RGM construct, sometimesreferred to as “test cells” (and optionally control cells) at one ormore predetermined concentrations. In some embodiment the concentrationis about up to 1 nM. In some embodiments the concentration is betweenabout 1 nM and about 100 nM. In some embodiments the concentration isbetween about 100 nM and about 10 μM. In some embodiments theconcentration is at or above 10 μM, e.g., between 10 μM and 100 μM.Following incubation for an appropriate time, optionally a predeterminedtime, the effect of agents or composition on the level of the reportermolecule in the test cells is determined. Cells can be contacted forvarious periods of time. In certain embodiments cells are contacted forbetween 12 hours and 20 days, e.g., for between 1 and 10 days, forbetween 2 and 5 days, or any intervening range or particular value.Cells can be contacted transiently or continuously. If desired, theagent can be removed prior to assessing the effect on the cells.

Conditions that may be tested or used in various embodiments may includeelectrical or mechanical stimulation, exposure to other cells or cellproducts such as extracellular matrix components, growth on or inparticular substrates or matrices, etc. In some embodiments themethylation state of a region of interest, e.g., a superenhancer,enhancer, or promoter of a cell type specific gene or cell statespecific gene, is detected or monitored as a cell undergoes a cellidentity or cell state transition such as reprogramming ordifferentiation or is exposed to agent(s) or condition(s) that might ormight not promote reprogramming or differentiation (e.g., agents beingtested for use in such processes). In some embodiments, particularmethylation changes that accompany or are required for reprogramming ordifferentiation may be identified. In some embodiments differences inmethylation state of a ROI between cells that are in different states orhave different identities may be determined. In some embodiments anagent or condition that inhibits or increases a methylation change thatwould normally occur during cell differentiation or that would typicallyoccur during reprogramming is identified.

In some embodiments, the agent is a DNA methylation inhibitor. A varietyof DNA methylation inhibitors are known in the art. See, e.g., Lyko, F.and Brown, R., Journal of the National Cancer Institute,97(20):1498-1506, 2005. Inhibitors of DNA methylation include nucleosideDNA methyltransferase inhibitors such as 5-azacytidine,5-azadeoxycytidine, and zebularine, non-nucleoside inhibitors such asthe polyphenol (−)-epigallocatechin-3-gallate (EGCG) and the smallmolecule RG108(2-(1,3-dioxo-1,3-dihydro-2H-isoindol-2-yl)-3-(1H-indol-3-yl)propanoicacid), compounds described in WO2005085196 and phthalamides,succinimides and related compounds as described in WO2007007054.Additional classes of compounds are: (1) 4-aminobenzoic acidderivatives, such as the antiarrhythmic drug procainamide and the localanesthetic procaine; (2) the psammaplins, which also inhibits histonedeacetylase (Pina, I. C., J Org Chem., 68(10):3866-73, 2003); (3)4-aminoquinoline-based inhibitors, such as SGI-1027 and its analogs(Rilova, E., et al., ChemMedChem. 2014 March; 9(3):590-601); and (4)oligonucleotides, including siRNAs, shRNAs, and specific antisenseoligonucleotides, such as MG98. DNA methylation inhibitors may act by avariety of different mechanisms. The nucleoside inhibitors aremetabolized by cellular pathways before being incorporated into DNA.After incorporation, they function as suicide substrates for DNMTenzymes. The nonnucleoside inhibitors procaine,epigallocatechin-3-gallate (EGCG), and RG108 have been proposed toinhibit DNA methyltransferases by masking DNMT target sequences (i.e.,procaine) or by blocking the active site of the enzyme (i.e., EGCG andRG108). In some embodiments the agent is an inhibitor of MEK or GSK3. Insome embodiments the agent is leukemia inhibitory factor (LIF).

In some embodiments, a DNA methylation reporter may be used to analyzethe functional and/or temporal relationship between DNA methylation andtranscription initiation or transcriptional silencing of a gene. Forexample, a DNA methylation reporter located in proximity to a regulatoryregion of a gene, such as a promoter region, could be used to determinewhether methylation of the regulatory region precedes silencing oftranscription of the gene, or whether silencing of transcriptionprecedes methylation.

In some aspects, a DNA methylation reporter may be used in cell lineagetracing. Lineage tracing refers to identifying the descendants of asingle cell. In lineage tracing, an individual cell is marked in such away that the mark is transmitted to the cell's descendants, resulting ina set of marked cells that arose from the same founder cell. Lineagetracing is useful, e.g., in understanding normal tissue development,cell and tissue turnover, and disease. Among other things, it canprovide information regarding the number of descendants of the foundercell, their location, and their differentiation status. In someembodiments, a marked cell whose lineage is to be traced can beintroduced into a subject, e.g., a non-human mammal. The mark allows thecell and its descendants to be distinguished from the cells and tissuesof the subject. A variety of marks can be used. In general, any of thereporter genes described herein can serve as a genetic label that marksa cell, e.g., for purposes of lineage tracing. Genetic labels that areparticularly suitable for lineage tracing include those that encodefluorescent proteins, luciferases, and enzymes that act on a substrateto produce a colored substance. If stably integrated into the genome, agenetic label is inherited by the cell's descendants, thus marking thempermanently. In some embodiments, cells are marked as a result ofrecombination mediated by a site-specific recombinase that is encoded bya reporter gene transcribed under control of an RGM promoter integratedinto the genome in proximity to a region of interest as described above.Cells in which the ROI has a particular methylation state or hasundergone a change in methylation state, and their progeny, can thus bedetected regardless of subsequent changes in the methylation state ofthe ROI.

In some embodiments an RGM construct is used together with asite-specific recombinase that is expressed in a cell- ortissue-specific manner. The site-specific recombinase activates theexpression of a reporter gene as described above (e.g., through excisionof a STOP cassette), resulting in permanent genetic labeling of alldescendants of the marked cells. The genetic label can then be used toidentify the cells in which the tissue- or cell-specific promoter wasactive, and the RGM reporter molecule can be used to determine the levelof methylation of the region of interest.

In some aspects, DNA methylation reporter constructs described hereincould be used in combination with any of a variety of methods and toolsknown in the art that are useful for marking, tracking, and/ormanipulating cells in vitro or in vivo, such as multicolor labelling byelectroporation of plasmids (e.g., methods known as StarTrack, MAGIC,and CLoNe), DNA barcoding, LeGO vectors, Brainbow technology, RGBmarking, optogenetics). RGB marking refers to the tagging of individualcells with unique hues resulting from simultaneous expression of thethree basic colors red, green and blue, provides a convenient toolboxfor the study of the CNS anatomy at the single-cell level. Usingγ-retroviral and lentiviral vector sets. RGB (Gomez-Nicola, D., et al.,Sci Rep. 2014; 4: 7520.) In some embodiments, such methods may be usedto detect or track single cells or clones of cells harboring a DNAmethylation reporter in vitro or in vivo. The DNA methylation reporteris used to detect or monitor methylation of a region of interest. Inparticular embodiments, cells harboring a DNA methylation reporterconstruct may be detected or tracked in the central nervous system, inthe hematopoietic system, in an organ or organism undergoing developmentor regeneration or wound healing, during an immune response, in a tumor.

In some aspects, a DNA methylation reporter may be used to evaluate theeffect of an agent on methylation of a DNA region of interest, identifyagents that modulate methylation of a region of interest, or identifycandidate therapeutic agents for treating a disease characterized byaberrant methylation of a region of interest (e.g., in which aberrantmethylation of a region of interest causes, wholly or partly, orcontributes to the disease or to one or more symptoms of the disease).As used herein, “treating” a disease is understood to include, forexample, ameliorating the disease in whole or in part, reducing theseverity of the disease, eliminating, alleviating or reducing one ormore symptoms of the disease, etc.

In some aspects, described herein is a method of evaluating the effectof an agent on the methylation state of a DNA region of interest in acell comprising steps of: contacting one or more cells comprising an RGMconstruct integrated in proximity to a region of interest in the genomewith a test agent; measuring expression of the reporter molecule; andcomparing the level of expression of the reporter molecule with acontrol value, wherein a difference between the measured value for thelevel of expression and the control value indicates that the test agentmodulates the methylation state of the region of interest. An agent thatmodulates the methylation state of a region of interest may be referredto as a methylation modulator. In some embodiments cells are subjectedto conditions that would normally cause an alteration in the methylationstate of an ROI. In some embodiments an agent that inhibits suchalteration, e.g., prevents it from occurring, may be identified.

In some embodiments, a DNA methylation reporter may be used to identifya candidate therapeutic agent for treating a disorder associated withaberrant DNA methylation. Aberrant DNA methylation plays a role in anumber of different disorders. In some aspects, inhibiting developmentof aberrant methylation or restoring a more normal DNA methylationpattern in cells of a subject suffering from such a disorder is usefulfor treating such diseases. In some embodiments, a DNA methylationreporter construct is inserted in proximity to a region of genomic DNAthat is aberrantly methylated in a disorder. Cells harboring the DNAmethylation reporter construct in proximity to the region may be used toidentify agents that affect the methylation state of the region, e.g.,agents that decrease or increase methylation of the region. In someembodiments, aberrant DNA methylation of one or more regions of genomicDNA occurs in cancer cells of a subject with cancer. In someembodiments, aberrant DNA methylation of one or more regions of genomicDNA occurs in one or more types of subtypes of immune cells of a subjectsuffering from an autoimmune disease. In some embodiments, aberrant DNAmethylation of one or more regions of genomic DNA occurs in neurons orglial cells of a subject suffering from a neurological disorder.

Non-limiting information regarding certain disorders associated withaberrant DNA methylation may be found in Longo, D., et al. (eds.),Harrison's Principles of Internal Medicine, 18th Edition; McGraw-HillProfessional, 2011 and/or in McKusick, V. A.: Mendelian Inheritance inMan. A Catalog of Human Genes and Genetic Disorders. Baltimore: JohnsHopkins University Press, 1998 (12th edition) or the more recent onlinedatabase: Online Mendelian Inheritance in Man, OMIM™. McKusick-NathansInstitute of Genetic Medicine, Johns Hopkins University (Baltimore, Md.)and National Center for Biotechnology Information, National Library ofMedicine (Bethesda, Md.), available on the worldwide web at subdomainncbi.nlm.nih.gov/omim/ and/or in Online Mendelian Inheritance in Animals(OMIA), a database of genes, inherited disorders and traits in animalspecies (other than human and mouse), available on the worldwide web atsubdomain omia.angis.org.au/contact.shtml.

In some embodiments the disorder associated with aberrant DNAmethylation is fragile X syndrome, a heritable neurodevelopmentaldisorder caused by a CGG repeat mutation on chromosome that expands the5′-non-coding region of the fragile X mental retardation 1 (FMR1) gene(Gene ID No: 2332 (human), Gene ID No: 14265 (mouse)). FMR1 encodes thefragile X mental retardation protein (FMRP), which regulates proteinexpression by interacting with mRNA. The so-called full mutation (>200CGG repeats) leads to hypermethylation of the FMR1 promoter, whichtranscriptionally silences FMR1 and reduces FMRP levels, resulting inthe disease phenotype. In some embodiments, a DNA methylation reporterconstruct is inserted in proximity to the FMR1 promoter in a normal cellor in a cell harboring a CGG repeat mutation associated with fragile Xsyndrome. The DNA methylation reporter may be used to identify agentsthat affect the methylation status of the FMR1 promoter (e.g., agentsthat inhibit or enhance methylation of the FMR1 promoter).

Autism spectrum disorders (ASD) are increasingly commonneurodevelopmental disorders characterized by characterized by impairedsocial interactions, impairment in communication, as well as restrictiveor repetitive behaviors and interests. Aberrant DNA methylation has beenimplicated as playing a role in ASD. A number of genomic regions thatare aberrantly methylated in cells of subjects with autism have beenidentified. In some embodiments the ROI is in the SHANK3 gene. SHANK3 isstrongly suspected as being involved in the pathogenesis andneuropathology of ASD. Five CpG islands have been identified in theSHANK3 gene, and tissue-specific expression of SHANK3 is regulated byDNA methylation in an epigenetic manner. Increased DNA methylation hasbeen identified in three intragenic CGIs (CGI-2, CGI-3 and CGI-4) in ASDbrain tissues, associated with altered expression and alternativesplicing of SHANK3 isoforms (Zhu, L., et al., Hum Mol Genet. 2014 Mar.15; 23(6):1563-78).

In some embodiments the disorder associated with aberrant DNAmethylation is cancer. Aberrant DNA methylation is a prominent findingin all cancers in which it has been studied. The transcriptional startsites of many genes that encode tumor suppressors, such asretinoblastoma-associated protein 1 (RB 1), MLH1, p16, and BRCA1, amongothers, lie within or contain CGIs. The promoters of these genes havebeen found to be extensively methylated in various tumors. Promotermethylation may contribute to silencing or maintenance of silencing oftumor suppressor genes. ARHI and PEG3 are tumor suppressor genes thatare themselves imprinted genes. Methylation of the promoter region ofthe allele of these genes that is normally expressed, leading tosilencing of expression, is implicated as a cause of ovarian cancer, andre-expression of these genes can inhibit ovarian cancer growth (Feng,W., et al., Cancer. 2008; 112(7):1489-502).

In some embodiments the ROI is the promoter region of the succinatedehydrogenase C (SDHC) gene. Loss of SDH function is a driver mechanismin several cancers. SDH-deficient gastrointestinal stromal tumors (dSDHGISTs) often harbor deleterious mutations in SDH subunit genes (SDHA,SDHB, SDHC, and SDHD, termed SDHx), but some are SDHx wild type (WT).Genome-wide DNA methylation and expression profiling recently identifiedSDHC promoter-specific CpG island hypermethylation and gene silencing inSDHx-WT dSDH GISTs (15 of 16 cases), six in the setting of themultitumor syndrome Carney triad (Killian, J K, et al., Sci Transl Med.2014; 6(268):268ra177), providing an explanation for the pathogenesis ofdSDH GIST, whereby loss of SDH function results from either SDHxmutation or SDHC promoter hypermethylation. An agent that could at leastin part reverse SDHC promoter hypermethylation is a candidate agent fortreatment of cancers associated with SDHC promoter hypermethylation,including SDHx-WT dSDH GISTs.

Myelodysplastic syndrome (MDS) is a group of neoplastic disorders ofhematopoietic stem cells (HSCs) that is characterized by, among otherthings, inefficient hematopoiesis and susceptibility to acute myeloidleukemia (AML). AML is characterized by accumulation of immature myeloidcells in the bone marrow and peripheral blood. Promoter DNAhypermethylation and associated silencing of the tumor suppressor geneCDKN2b, encoding p15INK4b, has been reported to occur in up to 80% ofAML patients. The DNA methylation inhibitors 5-azacitidine (AzaC) and5-aza-2′-deoxycytidine (decitabine) are used in the treatment of asubset of patients with these diseases and may act at least in part byreactivating expression of tumor suppressors such as CDKN2b. Methylationwithin gene bodies has also been observed in cancer cells and has beenreported to lead to increased transcription, which could increasetranscription of genes that contribute to abnormally increased cellproliferation in cancer or other proliferative disorders.

Aberrant DNA methylation is also associated with resistance of cancersto various chemotherapeutic agents, which can lead to treatment failure.In some instances, aberrant methylation can also or alternatively confersensitivity to various agents. For example, epigenetic inactivation ofargininosuccinate synthetase (ASS1), due to aberrant methylation in theASS1 promoter correlates with transcriptional silencing and contributesto treatment failure and clinical relapse in ovarian cancer but confersarginine auxotrophy and sensitivity to arginine deprivation (Nicholson,U, et al., Int J Cancer. 2009; 125(6): 1454-63). Downregulation ofpolo-like kinase 2 due to methylation of the CpG island in the Plk2 genepromoter can confer resistance to platinum-based therapy andtaxane-based therapy (e.g., paclitaxel) (Syed, N., et al., Cancer Res.2011 May 1; 71(9):3317-27). Promoter methylation in p57(Kip2) causescarboplatin resistance but also results in collateral sensitivity to theCDK inhibitor seliciclib (Coley, H M, et al., Br J Cancer. 2012;106(3):482-9). In some embodiments an ROI is a promoter region of a genecharacterized in that aberrant methylation of the ROI, affects theresistance or sensitivity of a cell to a particular agent, e.g., achemotherapeutic agent or other drug. In some embodiments an agent thatinhibits or decreases methylation of a region that, when methylated,confers resistance to a therapeutic agent, could be used to prevent orreduce the likelihood of emergence of resistance to the therapeuticagent. A subject in need of treatment may be treated with both thetherapeutic agent and the methylation modulator (combination therapy).In some embodiments an agent that increases methylation of a regionthat, when methylated, confers sensitivity to a therapeutic agent, couldbe used to enhance the efficacy of the therapeutic agent. A subject inneed of treatment may be treated with both the therapeutic agent and themethylation modulator. It should be understood that agents administeredin a combination therapy approach need not be administered in the samecomposition (although they may be), nor at the same time (although theymay be). The agents may be administered in any appropriate temporalrelationship to each other to achieve the desired effect.

In some embodiments a DNA methylation reporter may be used to evaluateor monitor the methylation state of a region of interest in cancer cellsisolated from a subject with cancer on in cancer cells in vivo in anon-human subject. The cancer may have been experimentally induced byintroducing cancer cells into the subject or may have arisen in acancer-prone non-human animal. The non-human animal may be one thatharbors a genetic modification that increases its risk of developingcancer, such as a knockout of a tumor suppressor gene, a transgene thatencodes an oncogene, or a combination thereof. The cells may harbor anRGM construct integrated into their genome in proximity to a region ofinterest, e.g., a promoter or enhancer or gene body of an oncogene ortumor suppressor gene.

Aberrant DNA methylation has been linked to a wide variety of otherdiseases, including autoimmune diseases such as rheumatoid arthritis(Nakano K., et al., (2013) DNA methylome signature in rheumatoidarthritis, Ann Rheum Dis., 72(1):110-7) and lupus (Coit, P., et al.,Genome-wide DNA methylation study suggests epigenetic accessibility andtranscriptional poising of interferon-regulated genes in naïve CD4+ Tcells from lupus patients. J Autoimmun. 2013 June; 43:78-84),neurodegenerative diseases such as Alzheimer's disease (De Jager, P. L.et al. Alzheimer's disease pathology is associated with earlyalterations in brain DNA methylation at ANK1, BIN1 and other loci. Nat.Neurosci. Nat Neurosci. 2014 September; 17(9):1156-63), psychiatricdisorders such as schizophrenia, depressive disorders, and bipolardisorder, to name a few.

Aberrant DNA methylation of DNA regions that regulate expression ofgenes involved in autoimmunity or inflammation may cause or contributeto autoimmune and inflammatory diseases. In some embodiments a geneinvolved in autoimmunity or inflammation encodes a cytokine-regulatingprotein, cytokine-regulating microRNA (miRNA). In some embodiment a geneinvolved in autoimmunity is a cytokine gene, cytokine receptor gene, orcytokine-responsive gene. “Cytokine gene” refers to a gene that encodesa cytokine or cytokine subunit (chain). “Cytokine receptor gene” refersto a gene that encodes a cytokine receptor or cytokine receptor subunit(chain). Cytokines include, for example, chemokines, interferons,interleukins, lymphokines, and tumor necrosis factor alpha. In someembodiments a cytokine is an interleukin (IL) e.g., any of IL-1 toIL-38. In particular embodiments a cytokine is IL-2, IL-3, IL-4, IL-5,IL-6, IL-7, IL-10, IL-12, IL-15, IL-17, IL-21, IL-23, IL-27, or IL-35.In some embodiments a cytokine is an interferon, e.g., an IFN-alpha,IFN-beta, IFN-gamma. One of ordinary skill in the art appreciates thevarious genes that encode chemokines, interferons, interleukins,lymphokines, tumor necrosis factor alpha, and receptors for any one ormore of these. In some embodiments the cytokine is one that stimulatesdevelopment, survival, activation, proliferation, and/or differentiationof one or more types or subtypes of immune system cells, e.g., T cells(e.g., CD4+ helper T cells, CD8+ cytotoxic T cells, Tregs, Th17 cells),NK cells, B cells, dendritic cells, monocytes, macrophages, orprecursors of any of the foregoing. Cytokine-regulating proteinsinclude, e.g., a transcription factors that increases or decreasesexpression of one or more cytokines or cytokine receptors.Cytokine-regulating miRNAs include miRNAs that inhibit expression of oneor more cytokines or cytokine receptors. “Cytokine-responsive gene”refers to genes whose expression is regulated by one or more cytokines.

In some aspects, aberrant methylation of regulatory regions (e.g.,promoters, enhancers, superenhancers) of cytokine genes, cytokinereceptor genes, or cytokine-regulatory genes may result in aberrantexpression of such genes (e.g., aberrantly increased expression ofpro-inflammatory cytokines or their receptors or aberrantly reducedexpression of anti-inflammatory cytokines or their receptors) which maycause or contribute to autoimmune and inflammatory diseases. Forexample, in some embodiments aberrantly reduced methylation ofregulatory regions (e.g., promoters) of cytokine genes, cytokinereceptor genes, or cytokine-regulatory genes may result in aberrantlyincreased expression of pro-inflammatory cytokines or their receptors,thereby causing or contributing to autoimmune disease or inflammation;in some embodiments aberrantly increased methylation of regulatoryregions (e.g., promoters) of cytokine genes, cytokine receptor genes, orcytokine-regulatory genes may result in aberrantly decreased expressionof anti-inflammatory cytokines or their receptors, thereby causing orcontributing to autoimmune disease or inflammation. In some embodimentsaberrantly reduced methylation of regulatory regions (e.g., promoters)of pro-inflammatory cytokine-responsive genes may result in aberrantlyincreased expression of such genes, thereby causing or contributing toautoimmune disease or inflammation. In some embodiments aberrantlyincreased methylation of regulatory regions (e.g., promoters) ofanti-inflammatory cytokine-responsive genes may result in aberrantlydecreased expression of such genes, thereby causing or contributing toautoimmune disease or inflammation.

In some embodiments, aberrant DNA methylation of a regulatory region(e.g., a promoter, enhancer, or superenhancer) of a gene that regulatesthe development, survival, activation, activity, proliferation, and/ordifferentiation of immune cells may cause or contribute to autoimmune orinflammatory disease. For example, in some embodiments, aberrant DNAmethylation of a regulatory region (e.g., a promoter, enhancer, orsuperenhancer) of a gene that regulates development, survival,activation, activity, proliferation, and/or differentiation of immunecells promotes the development, survival, activation, proliferation,and/or differentiation of one or more types or subtypes of immune systemcells that causes or contributes to an autoimmune or inflammatorydisease or inhibits the development, survival, activation, activity,proliferation, and/or differentiation of one or more types or subtypesof immune system cells that would normally contribute to properregulation of the immune system so as to inhibit autoimmunity orinflammation. In some embodiments the gene encodes a transcriptionfactor that contributes to the establishment or maintenance of cellidentity of such immune cells.

Autoimmune diseases include, for example, acute disseminatedencephalomyelitis, alopecia areata, antiphospholipid syndrome,autoimmune hepatitis, autoimmune myocarditis, autoimmune pancreatitis,autoimmune polyendocrine syndromes, autoimmune uveitis, inflammatorybowel disease (Crohn's disease, ulcerative colitis), type I diabetesmellitus (e.g., juvenile onset diabetes), multiple sclerosis,scleroderma, ankylosing spondylitis, sarcoid, pemphigus vulgaris,pemphigoid, psoriasis, myasthenia gravis, systemic lupus erythemotasus,rheumatoid arthritis, juvenile arthritis, psoriatic arthritis, Behcet'ssyndrome, Reiter's disease, Berger's disease, dermatomyositis,polymyositis, antineutrophil cytoplasmic antibody-associatedvasculitides (e.g., granulomatosis with polyangiitis (also known asWegener's granulomatosis), microscopic polyangiitis, and Churg-Strausssyndrome), scleroderma, Sjögren's syndrome, anti-glomerular basementmembrane disease (including Goodpasture's syndrome), dilatedcardiomyopathy, primary biliary cirrhosis, thyroiditis (e.g.,Hashimoto's thyroiditis, Graves' disease), transverse myelitis, andGuillaine-Barre syndrome. Inflammatory diseases include autoimmunediseases and other diseases in which there is excessive or inappropriateinflammation. In some embodiments aberrant DNA methylation may cause orcontribute to one or more such disorders.

In some embodiments a DNA region of interest is a regulatory region(e.g., a promoter, enhancer, or superenhancer) or gene body of acytokine gene, cytokine receptor gene, cytokine regulatory gene, orcytokine-responsive gene. In some embodiments a DNA region of interestis a regulatory region (e.g., a promoter, enhancer, or superenhancer) orgene body of a gene involved in the development, survival, activation,proliferation, and/or differentiation of one or more types of subtypesof immune cells, such as a gene that encodes a transcription factor thatcontributes to the establishment or maintenance of cell identity of suchcells. In some embodiments an RGM construct is integrated into aregulatory region (e.g., a promoter, enhancer, or superenhancer) or genebody of a cytokine gene, cytokine receptor gene, cytokine regulatorygene, cytokine-responsive gene, or gene involved in the development,survival, activation, proliferation, and/or differentiation of one ormore types of subtypes of immune cells.

In some embodiments, a DNA methylation reporter may be used to identifyan agent that selectively increases or decreases the methylation of aregion that is aberrantly methylated in cells from a subject with adisorder associated with aberrant DNA methylation. For example, it wouldbe of interest to identify agents that can selectively cause an increaseor decrease in methylation of a region of DNA that is aberrantlyhypermethylated in such a disorder or that can selectively cause anincrease or decrease in methylation of a region of DNA that isaberrantly hypomethylated in such a disorder. Selectively causing anincrease or decrease in methylation of a region of DNA refers to causingan increase or decrease in methylation of the region withoutsignificantly affecting the methylation state of most other regions inthe genome. In some embodiments a selective agent increases methylationof the region of interest by at least 50% or to a level of at least 80%,90%, or more, but has no more than a 1%, or in some embodiments no morethan a 5%, or in some embodiments no more than a 10% effect on theoverall level of methylation in the genome. In some embodiments aselective agent decreases methylation of the region of interest by atleast 50% or to a level of no more than 20%, or no more than 10%, buthas no more than a 1%, or in some embodiments no more than a 5%, or insome embodiments no more than a 10% effect on the overall level ofmethylation in the genome. In some embodiments a selective agentdecreases methylation of an aberrantly hypermethylated region to anapproximately normal level for that region, but has no more than a 1%,or in some embodiments no more than a 5%, or in some embodiments no morethan a 10% effect on the overall level of methylation in the genome. Insome embodiments a selective agent increases methylation of anaberrantly hypomethylated region to an approximately normal level forthat region, but has no more than a 1%, or in some embodiments no morethan a 5%, or in some embodiments no more than a 10% effect on theoverall level of methylation in the genome. In some embodiments, agentsthat can selectively cause an increase in methylation of a region of DNAthat is aberrantly hypermethylated in cells from a subject with adisorder associated with aberrant DNA methylation can be used togenerate a cell-based model or an animal model of the disorder. In someembodiments, agents that can selectively cause a decrease in methylationof a region of DNA that is aberrantly hypomethylated in cells from asubject with a disorder associated with aberrant DNA methylation can beused to generate a cell-based model or an animal model of the disorder.The cell-based or animal model may be used to screen for agents thatcould be used to treat the disorder.

Agents that can cause a selective decrease in methylation of a region ofDNA that is hypermethylated in a disorder or that can cause a selectiveincrease in methylation of a region of DNA that is hypomethylated in adisorder can serve as candidate therapeutic agents for treating thedisorder. Thus in some embodiments, a DNA methylation reporter may beused to identify a candidate therapeutic agent for a disorder associatedwith aberrant DNA methylation. Such a candidate therapeutic agent may,for example, cause reactivation of an aberrantly silenced gene such as atumor suppressor gene (which may have an aberrantly hypermethylatedpromoter region) or may inhibit expression of an aberrantly expressedgene that causes or contributes to a disorder (e.g., an oncogene, in thecase of cancer). The term “oncogene” encompasses nucleic acids that,when expressed, can increase the likelihood of or contribute to cancerinitiation or progression. Normal cellular sequences (“proto-oncogenes”)can be activated to become oncogenes (sometimes termed “activatedoncogenes”) by mutation and/or aberrant expression. In variousembodiments an oncogene can comprise a complete coding sequence for agene product or a portion that maintains at least in part the oncogenicpotential of the complete sequence or a sequence that encodes a fusionprotein. Oncogenic mutations can result, e.g., in altered (e.g.,increased) protein activity, loss of proper regulation, or an alteration(e.g., an increase) in RNA or protein level. Aberrant expression mayoccur, e.g., due to chromosomal rearrangement resulting in juxtapositionto regulatory elements such as enhancers, epigenetic mechanisms, or dueto amplification, and may result in an increased amount ofproto-oncogene product or production in an inappropriate cell type.Proto-oncogenes often encode proteins that control or participate incell proliferation, differentiation, and/or apoptosis. These proteinsinclude, e.g., various transcription factors, chromatin remodelers,growth factors, growth factor receptors, signal transducers, andapoptosis regulators. A tumor suppressor gene (TSG) may be any genewherein a loss or reduction in function of an expression product of thegene can increase the likelihood of or contribute to cancer initiationor progression. Loss or reduction in function can occur, e.g., due tomutation or epigenetic mechanisms. Many TSGs encode proteins thatnormally function to restrain or negatively regulate cell proliferationand/or to promote apoptosis. Exemplary oncogenes include, e.g., MYC,SRC, FOS, JUN, MYB, RAS, RAF, ABL, ALK, AKT, TRK, BCL2, WNT, HER2/NEU,EGFR, MAPK, ERK, MDM2, CDK4, GLI1, GLI2, IGF2, etc. Exemplary TSGsinclude, e.g., RB, TP53, APC, NF1, BRCA1, BRCA2, PTEN, CDK inhibitoryproteins (e.g., p16, p21), PTCH, WT1, Polo-like kinases, SFRP1, APC,HHIP, SOCS1, CASP8, and RASSF1A etc. It will be understood that a numberof these oncogene and TSG names encompass multiple family members andthat many other oncogenes and TSGs are known. In some embodiments a ROIis a promoter region of a TSG, e.g., a TSG characterized in thathypermethylation of its promoter region is found in cancer.

In some embodiments the disorder associated with aberrant DNAmethylation is an imprinting disorder. Imprinting disorders cansometimes result from loss of function of the allele of an imprintedgene that is normally expressed. Loss of function of the allele that isnormally expressed may occur due to deletion, mutation,hypermethylation, or other causes. The other allele (the imprintedallele) may be normal, but is silenced by imprinting. In someembodiments, a candidate therapeutic agent may be one that could causedemethylation of a DMR (e.g., a DMR that acts as an ICR) that causes thesilencing of the imprinted allele. In some embodiments, such an agentmay be identified by integrating a DNA methylation reporter in proximityto the DMR in the chromosome in which the DMR normally acts to silencethe imprinted allele. Test agents are screened to identify one or moreagents that cause expression of the reporter molecule. In someembodiments such an agent may then be tested to determine its effect onexpression of the imprinted allele by, e.g., directly measuring a geneproduct of the imprinted allele.

In some embodiments the imprinting disorder is Beckwith-Wiedemannsyndrome (BWS; Online Mendelian Inheritance in Man (OMIM) #130650), acondition that is characterized by macrosomia, macroglossia, abdominalwall defects, and variable minor features. The relevant imprintedchromosomal region in BWS is 11p15.5, which consists of two imprinteddomains, IGF2/H19 and CDKN1C/KCNQ1OT1, H19DMR and KvDMR1 being therespective imprinting control regions. Loss of methylation (LOM) atKvDMR1 and gain of methylation (GOM) at H19DMR are causes of BWS. Insome embodiments an RGM construct may be integrated into 11p15.5 andused to detect or monitor methylation of H19DMR and/or KvDMR1 and/or toidentify an agent that modulates the methylation state.

A candidate therapeutic agent identified according to any of the methodsmay be tested in isolated cells, e.g., cells obtained from a subjectsuffering from the disorder of interest or cells that are geneticallyengineered to harbor one or more mutations that causes or contributes tothe disorder. For example, the effect of a candidate therapeutic agentfor cancer may be tested to determine its effect on the proliferation ofcancer cells in vitro. Numerous cancer cell lines are known in the art.An agent that inhibits the proliferation of cancer cells is a candidatetherapeutic agent for treating cancer.

A candidate therapeutic agent identified according to any of the methodsmay be tested in human subjects with the disease or in non-human animals(e.g., animals that serve as a model for a disease) by determiningwhether the agent alleviates symptoms or signs of the disease orotherwise shows evidence of efficacy. One of ordinary skill in the artis aware of suitable animal models for disorders associated withaberrant DNA methylation and imprinting disorders. For example, acandidate therapeutic agent for treating cancer can be administered tonon-human mammal that serves as an animal model for cancer (e.g., ananimal with a spontaneously arising cancer or a cancer that isexperimentally produced by, e.g., injecting or otherwise introducingcancer cells into the animal). The effect on one or more properties ofthe cancer (e.g., cancer development, size, growth rate, rate ofmetastasis, etc.) is determined.

A subject, e.g., a human subject suffering from a disorder associatedwith aberrant DNA methylation, may be tested to determine or confirmthat the subject suffers from aberrant DNA methylation. In someembodiments, a subject may be tested to determine or confirm that aparticular aberrant DNA methylation pattern (e.g., aberrant methylationof a particular genomic region) exists in at least some of the subject'scells prior to administration of an agent that is intended to affect theDNA methylation pattern in that region. In some embodiments, a cancermay be tested to determine or confirm that a particular aberrant DNAmethylation pattern (e.g., aberrant methylation of a particular genomicregion) exists in at least some of the cancer cells prior toadministration of an agent that is intended to affect the DNAmethylation pattern in that region. The subject may be tested byobtaining a sample comprising cells from the subject and utilizingstandard methods for methylation analysis such as bisulfite sequencing.

In some embodiments contacting comprises administration of an agent to asubject, which may be by any route (e.g., oral, intravenous,intraperitoneal, gavage, topical, transdermal, intramuscular, enteral,subcutaneous), may be systemic or local, may include any dose (e.g.,from about 0.01 mg/kg to about 500 mg/kg), may involve a single dose ormultiple doses. An agent may be combined with a physiologicallyacceptable carrier (e.g., water, saline, 5% dextrose), excipients, orother substances conventionally combined with active agents foradministration to a subject.

In some embodiments a genome-wide screen may be performed using cellsthat have an RGM construct integrated into their genome, e.g., inproximity to an ROI. The genome-wide screen may be performed using alibrary of test cells that overexpress or substantially lack expressionof most (e.g., at least 80%, 85%, 90%, 95%, 96%, 97%, 97%, 99%, or 100%)of the genes in a given mammalian genome, i.e., a set of cells, cellpopulations, or cell lines wherein each cell, cell population, or cellline has one gene that is knocked out (e.g., by a mutation introduced bygenome editing, such as through use of CRISPR technology), inhibited byRNAi, or overexpressed. In some embodiments the library of cells isgenerated by introducing a cDNA expression library, or an shRNA library,or a variomics library into cells or cell line of interest. In someembodiments at least 10,000 genes, at least 15,000 genes, or more, aretested. In some embodiments the screen is a pooled screen whereinmembers of the library are cultured together. In some embodiments, cellsthat have a phenotype of interest are identified and, optionally,separated from the other cells. In some embodiments the particular genethat is altered (e.g., knocked out) or overexpressed in such cells isidentified. The different members of the library may bear DNA barcodesto allow them to be readily distinguished. In some embodiments thescreen involves different members of the library being cultured inseparate vessels, e.g., wells of a microwell plate. The genome-widescreen may, for example, identify genes that regulate or otherwiseaffect the methylation state of the ROI. In some embodiments the libraryof cells is subjected to conditions that would normally cause analteration in the methylation state of the ROI. Cells in which suchalteration fails to occur may be identified. In some embodiments thescreen may identify one or more genes that is essential for methylationof a ROI. In some embodiments, e.g., where aberrant methylation of theROI occurs in a disorder, such genes may be targets for drug discovery,e.g., discovery of agents that modulate expression or activity of thegenes and thus modulate methylation of the ROI.

In some embodiments, DNA methylation, detected using an RGM, can be usedas a readout to distinguish and/or isolate different cell types. In someembodiments, cell types that are relevant for purposes such asregenerative medicine and/or cell transplantation (e.g. beta cells,neurons, or other cell types mentioned herein) may be identified orisolated. For example, cells may have an RGM construct integrated intotheir genome in a ROI whose methylation state (e.g., hypermethylated orhypomethylated) is characteristic of a given cell type of interest.Cells in which the ROI has a methylation state characteristic of thecell type of interest may be isolated from the population. In someembodiments cells can be subjected to reprogramming,transdifferentiation, or differentiation and then analyzed to determinethe methylation level of the ROI. Cell types of interest may then beisolated from the population. In some embodiments, the RGM construct maybe integrated into the genome flanked by sites for a site-specificrecombinase. If desired the RGM construct may be excised by expressingor delivering the recombinase to the cells.

All patents, patent applications, and publications (e.g., scientificarticles, books, websites, and databases) mentioned herein areincorporated by reference in their entirety. It is also noted that thereferences cited in the various references cited herein are alsoconsidered to be incorporated herein. In case of a conflict between thespecification and any of the incorporated references, the specification(including any amendments thereof, which may be based on an incorporatedreference), shall control. Complete citations for certain referencescited in the application are collected in the Reference List.

One skilled in the art readily appreciates that the present invention iswell adapted to carry out the objects and obtain the ends and advantagesmentioned, as well as those inherent therein. The details of thedescription and the examples herein are representative of certainembodiments, are exemplary, and are not intended as limitations on thescope of the invention. Modifications therein and other uses will occurto those skilled in the art. These modifications are encompassed withinthe spirit of the invention. It will be readily apparent to a personskilled in the art that varying substitutions and modifications may bemade to the invention disclosed herein without departing from the scopeand spirit of the invention.

The articles “a”, “an”, and “the” as used herein in the specificationand in the claims, unless clearly indicated to the contrary, should beunderstood to include the plural referents. Claims or descriptions thatinclude “or” between one or more members of a group are consideredsatisfied if one, more than one, or all of the group members are presentin, employed in, or otherwise relevant to a given product or processunless indicated to the contrary or otherwise evident from the context.Embodiments are disclosed in which exactly one member of the group ispresent in, employed in, or otherwise relevant to a given product orprocess. Embodiments in which more than one, or all of the group membersare present in, employed in, or otherwise relevant to a given product orprocess are also disclosed. Furthermore, it is to be understood thatdisclosed herein are all variations, combinations, and permutations inwhich one or more limitations, elements, clauses, descriptive terms,etc., from one or more of the listed claims is introduced into anotherclaim dependent on the same base claim (or, as relevant, any otherclaim) unless otherwise indicated or unless it would be evident to oneof ordinary skill in the art that a contradiction or inconsistency wouldarise. It is contemplated that all embodiments described herein areapplicable to all different aspects described herein where appropriate.It is also contemplated that any of the embodiments or aspects can befreely combined with one or more other such embodiments or aspectswhenever appropriate. Section headings are for convenience only and notintended to limit the disclosure in any way. Where elements arepresented as lists, e.g., in Markush group or similar format, it is tobe understood that each subgroup of the elements is also disclosed, andany element(s) can be removed from the group. It should be understoodthat, in general, where an aspect or embodiment is/are referred to ascomprising particular elements, features, etc., certain aspects andembodiments could consist, or consist essentially of, such elements,features, etc. For purposes of simplicity those aspects and embodimentsmay not in every case have been specifically set forth in so many wordsherein. It should also be understood that any embodiment or aspect canbe explicitly excluded from the claims, regardless of whether thespecific exclusion is recited in the specification. For example, any oneor more reporter molecules, reporter genes, regions of interest, nucleicacids, polypeptides, cells, species or types of organism, agents,disorders, subjects, or combinations thereof, can be excluded.

Where the claims or description relate to a composition of matter, e.g.,a nucleic acid, polypeptide, cell, or non-human animal, it is to beunderstood that methods of making, obtaining, or using the compositionof matter according to any of the methods disclosed herein, and methodsof using the composition of matter for any of the purposes disclosedherein are disclosed, unless otherwise indicated or unless it would beevident to one of ordinary skill in the art that a contradiction orinconsistency would arise. Where the claims or description relate to amethod, e.g., it is to be understood that methods of making compositionsuseful for performing the method, and products produced according to themethod, are disclosed, unless otherwise indicated or unless it would beevident to one of ordinary skill in the art that a contradiction orinconsistency would arise. Unless clearly indicated to the contrary, inany methods described or claimed herein that include more than one act,the order of the acts of the method is not necessarily limited to theorder in which the acts of the method are recited, but the disclosureincludes embodiments in which the order is so limited. Where thedisclosure refers to a method it should be understood that anycomponents needed or useful for performing the method can be providedand that the method can be performed under appropriate conditions andfor an appropriate time to achieve a desired result or outcome. Unlessotherwise indicated or evident from the context, any product orcomposition described herein may be considered “isolated” or “purified”.

Where ranges are given herein, embodiments are disclosed in which theendpoints are included, embodiments are disclosed in which bothendpoints are excluded, and embodiments are disclosed in which oneendpoint is included and the other is excluded. It should be assumedthat both endpoints are included unless indicated otherwise.Furthermore, unless otherwise indicated or otherwise evident from thecontext and understanding of one of ordinary skill in the art, valuesthat are expressed as ranges can assume any specific value or subrangewithin the stated ranges in different embodiments, to the tenth of theunit of the lower limit of the range, unless the context clearlydictates otherwise.

Where a series of numerical values (e.g., a percentage) is statedherein, the disclosure discloses embodiments that relate to anyintervening value or range defined by any two values in the series. Thelowest value may be taken as a minimum and the greatest value may betaken as a maximum. Where a set of ranges is set forth, the disclosurediscloses embodiments that relate to any range that encompasses any twoor more of the ranges, using either endpoint of the lowest range as thelower endpoint and either endpoint of the highest range as the higherendpoint. For example, where the ranges 1-10, 10-100, 100-500, 500-1000,1000-2000 is recited, ranges such as 1-100, 1-500, 10-500, 100-1000,500-2000, 1-2000, etc., are disclosed. Furthermore, where a set of lowerendpoints and a set of higher endpoints for a range are set forth, thedisclosure encompasses embodiments that relate to all possible rangesthat have a member of the set of lower endpoints as a lower endpoint anda member of the set of upper endpoints as an upper endpoint, i.e., allcombinations of lower endpoints and higher endpoints are disclosed. Forexample, if a nucleic acid sequence is said to extend from −300, −200,or −100 to +1, +100, or +200, the disclosure provides embodiments thatextend from −300 to +1, −300 to +100, −300 to +200, −200 to +1, −200 to+100, −200 to +200, −100 to +1, −100 to +100, and −100 to +200.

EXAMPLES Example 1 A Methylation Sensitive Reporter System Based on aMinimal Promoter

We set out to generate a DNA methylation reporter system that is capableof visualizing genomic methylation states at single cell resolution. Thedesign of the reporter was based on two premises: (i) previousobservations suggesting that CpG sites can serve as cis-acting signals,affecting the methylation state of adjacent CpGs (Brandeis et al., 1994;Mummaneni et al., 1995; Turker, 2002); (ii) a methylation sensitivepromoter, when introduced in proximity to a CpG region of choice, may beutilized to report on methylation changes of the adjacent sequences.Thus, an important issue in establishing a DNA methylation reporter wasidentifying a methylation sensitive promoter that can be affected byexogenous methylation changes without being independently regulated bythe DNA methylation machinery. Constitutively active genes usuallycontain hypomethylated high density CpG islands (CGIs) in their promoterregions and are not regulated by DNA methylation (Deaton and Bird, 2011)whereas gene promoters associated with low density CGI are activated andrepressed in a tissue-specific manner. Because methylation of bothclasses of promoters is either not affected by DNA methylation or isregulated by the DNA methylation machinery in a tissue-dependent manner,these promoters are typically not well suited for use as DNA methylationreporters. In contrast, imprinted gene promoters exhibit inherentsensitivity to DNA methylation of adjacent genomic regions, resulting intranscriptional activation or silencing. This mechanism has beenestablished for a subgroup of germline-derived differentially methylatedregions (DMRs), sometimes termed “imprinting control regions” (ICR) thataffect in cis the methylation state of secondary regulatory promoterelements, which in turn control imprinted gene activity. Importantly,the methylation state of such regions is subsequently maintainedthroughout normal development, and is therefore not regulated by the DNAmethylation machinery in a tissue-specific manner (Ferguson-Smith,2011). We hypothesized that these intrinsic characteristics of imprintedgene promoters make them attractive as putative methylation sensors. Anexample of the phenomenon of imprinting is the Prader-Willi Angelmanregion, in which a DMR at the Snrpn gene promoter region controls itsparent-of-origin monoallelic expression (Buiting et al., 1995; Kantor etal., 2004). Thus, we identified the Snrpn promoter as an attractivecandidate to generate a DNA methylation reporter.

To establish a DNA methylation reporter, we generated a syntheticminimal Snrpn promoter that includes the conserved elements betweenhuman and mouse and contains the endogenous imprinted DMR region (FIG.6A). The minimal promoter region driving GFP was cloned into a sleepingbeauty transposon vector (Ivics et al., 1997) to facilitate stablesingle copy integration into the genome.

The sequence of the Snrpn minimal promoter is underlined in thefollowing ˜1.5 kb of mouse genomic sequence, in which the “G” residue inbold, italic font is the transcription start site (TSS) for a transcriptthat encodes the Snrpn protein and the protein known as SNRPN upstreamreading frame (Snurf), and the “ATG” in bold, italic font corresponds tothe start codon of the transcript:

(SEQ ID NO: 3) GTAGATTAAGAACCAGCCTCAGAAAAGCAACAACAAAATACACACCCTGCAGCGCTGAGCTACACTCCACCATTCCTAGCCCTAGTCTATTGTCTTTTCATTTTTCCATAAGTAGTCTGTCCTTGTGATTTTCATTTGCATTTCCATGGTGACTGACAATAGTATCTAATTGTTTAGTTCTATGTAAATAGATTTCTTCAGCTGTTTTCGAAAGTTCAGGTTTTGGTTACATTTAGAACTGAATGTATCTTCATTGAAGTTGAATTTAGGATGTTTGCGAACTGGATGCTAGCTCAGTGCGGGGGGAAGGGAAGTAGAGAACTTCCAACTTTGTTAGAATACCTCATTAACAGTTCTTGCAGGCCCTCATTAAGCTATGCTAAACCCATGTAAATTTAGCTTCCTTAGTTTTCTCCTTGCCATTTTGTTTTCCTAATCTTCAAATAATTGCATATTGAAGTTACTACCACAATAATACTTTTACTAGGCAGACAGGAAATTAATAGGTCAAAAGTAACTGAAATAAATTCTTATATATGTATCCACAATCTACAAAATGTTTTTGTTTTTGTTTTTAGATATTGTTACAAATTGAACCTGGCCTTGAGTATGCAAAAATACTGCTTTCTTAGAATAAGTTTCCTAAGAGCTGGAATTACTGGATGGCATTTCTATGAGGTCATATATTTGTTAGTAAATAGTGTCTACTTTTCACCCCCCAGGCATAACAACATTTAGGAAGCCCTGTCTCTAAAACCAACAACAACAAAAGAAGCAGATACATAAGTTTCATAACTGAATGTTCTTCCTATTAAAATTTAATCACACCATGATCTGGAGGAAATAGTTTTCTCCCAGTCATATGTTCTAACACAGAGAAAGAAAATACAAGTAATACTACATTAATGTAGAATGTAGAATTAGGAATCAGGATAACTTTTTTTTCTGTACAGAATTTTAAGTATCTGACAATTTGGCTGGGCTTCATGTTTGATTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGATACACTATGTAACATGATATAGCCTAGAAACCAGTCTTCCTCATATTGGAGATCAAACCTTTTTTCCTCTCCCACATAATAAAAATCTGTGTGATGCTTGCAATCACTTGGGAGCAATTTTTTTAAAAAATTAAATGTATTTAGTAATAGGCAATTATATCCATTATTCCAGATTGACAGTGATTTTTTTTTTTTTAATACACGCTCAAATTTCCGCAGTAGGAATGCTCAAGCATTCCTTTTGGTAGCTGCCTTTTGGCAGGACATTCCGGTCAGAGGGACAGAGACCCCTGCATTGCGGCAAAAATGTGCGCATGTGCAGCCATTGCCTGGGACGCATGCGTAGGGAGCCGCGCGACAAACCTGAGCCATTGCGGCAAGACTAGCGCAGAGAGGAGAGGGAGCCGGAGATGCCAGACGCTTGGTTCTGAGGAGTGATTTGCAACGCAATGGAGCGAGGAAGGTCAGCTGGGCTTGTGGATTCT.

Recent studies have demonstrated that different CGI vectors, when stablyinserted into mouse embryonic stem cells (mESCs), adopt a methylationpattern that corresponds to the in vivo methylation pattern of therespective endogenous sequence (Sabag et al., 2014). To test whether DNAmethylation can propagate into the Snrpn promoter region in vivo, wedesigned an experimental system in which the CGI regions of the Gapdhand Dazl genes were cloned upstream of our reporter (FIG. 1A). Thepromoter of Gapdh encompasses a hypomethylated CGI consistent withconstitutive expression in all tissues. In contrast, the Dazlpromoter-associated CGI is hypermethylated in all tissues excluding thegerm cells (Hackett et al., 2013). Given the different expression andmethylation patterns of both genes, we hypothesized that upon stableintegration of the two reporter vectors into the genome of mESCs theGapdh CGI would maintain its hypomethylated state, while the Dazl CGIwould be subjected to de novo methylation (Sabag et al., 2014). FIG. 6Bshows that more than 95% of cells carrying the Gapdh reporter expressedGFP. In contrast, more than 30% of cells carrying the Dazl reporter wereGFP negative, corresponding to reporter silencing. The effect of theDazl reporter becomes more robust upon continued passage, with more than80% of the cells faithfully silencing their reporter within 4 weeks(FIG. 1B).

To assess the DNA methylation levels of the Gapdh and Dazl reportersfollowing introduction into mESCs, we sorted Gapdh GFP positive and DazlGFP negative cell populations (FIGS. 1C-1D). The GFP expression statewas stable upon continuous culture and passaging of the two sorted cellpopulations for over 7 weeks (FIG. 1E). DNA was extracted from bothGapdh GFP positive and Dazl GFP negative cells and subjected tobisulfite conversion and PCR sequencing. FIG. 1F shows that Gapdh GFPpositive cells maintained the hypomethylated state at both Gapdh CGI andthe Snrpn promoter regions, whereas Dazl GFP negative cells becamehighly de novo methylated at the Dazl CGI region and its correspondingdownstream Snrpn promoter (FIG. 1G). These results are consistent withthe hypothesis that DNA methylation can be propagated from the CGI intothe Snrpn promoter region resulting in repression of transcriptionalactivity.

Example 2 DNA Methylation Reporter is a Reporter for In VivoDemethylation

The experiments described in Example 1 showed that the DNA methylationreporter (also referred to as a “reporter of genomic methylation” (RGM))faithfully reports on de novo methylation imposed in vivo on theunmethylated Dazl CGI donor test sequence. Conversely, we wereinterested to assess whether a methylated and silent donor Snrpnpromoter can be reactivated by means of demethylation acquired in vivo.For this we used the CpG methyltransferase M.SssI to in vitro methylateboth Gapdh and Dazl reporter constructs. Treatment of the plasmids withM.SssI enzyme followed by bisulfite conversion, PCR amplification andsequencing, confirmed the complete hypermethylation of both the CGI andSnrpn promoter regions (FIGS. 2A and 2B).

ESCs were transfected with either Gapdh or Dazl reporter and selectedfor cells carrying stably integrated vectors. Following two weeks ofculture we identified robust activation of GFP in virtually all cellscarrying the integrated Gapdh reporter. In contrast, cells carrying theDazl reporter remained GFP negative (FIGS. 2C and 2D). To assess the DNAmethylation state of the Gapdh and Dazl CGI and the respectivedownstream Snrpn promoter regions, DNA was extracted from the two celllines, subjected to bisulfite conversion, PCR amplification andsequencing. FIG. 2E demonstrates that, consistent with high GFPexpression, the Gapdh CGI and its downstream Snrpn promoter had becomefully demethylated. In contrast, the Dazl CGI and its downstream Snrpnpromoter sequences maintained the hypermethylated state in agreementwith complete repression of the GFP signal (FIG. 2F). Thus, our datasupport the hypothesis that a Snrpn promoter can faithfully report on invivo demethylation of the CGI in its proximity. These experimentsindicate that the Snrpn promoter is a faithful reporter of themethylation state of adjacent sequences.

Example 3 Dnmt1, 3a and 3b Mediate Methylation and Reporter Activity

We used ESCs deficient for the DNA methyltransferases Dnmt1, Dnmt3a andDnmt3b to gain mechanistic insights into demethylation and de novomethylation imposed on the Snrpn promoter in transfected ESCs. FIG. 2Gshows that introduction of an in vitro methylated Dazl Snrpn vector intoDnmt1 mutant cells resulted in about 50% GFP positive cells in contrastto no GFP positive cells when inserted into wild type (wt) cells.Because Dnmt1 is the maintenance DNA methyltransferase (Li et al.,1992), this result indicates that reactivation of the methylated Dazlreporter in Dnmt1 deficient cells occurred by passive demethylation. Toclarify the mechanism of de novo methylation, we introduced anunmethylated version of both vectors into mESCs deficient for both denovo DNA methyltransferases Dnmt3a and Dnmt3b (Pawlak and Jaenisch,2011). FIG. 2H shows that the vast majority of cells carrying the Dazlor the Gapdh reporters were positive for GFP unlike reporter expressionin control V6.5 cells (FIG. 1B and FIG. 6B), which is consistent withDnmt3a/b mediating de novo methylation and reporter silencing.

Recent studies have shown that culturing mESCs in 2i medium (inhibitorsof MEK and GSK3), and leukemia inhibitory factor (LIF) results indownregulation of Dnmt3a and Dnmt3b, consequently leading to globalhypomethylation (Lee et al., 2014). To assess whether these cultureconditions affect reporter activity, we transfected the unmethylatedGapdh and Dazl reporters into wt mESCs cultured in 2i and LIF. FIG. 2Ishows that the great majority of the stably transfected cells were GFPpositive, consistent with 2i-mediated downregulation of the Dnmt3a and3b.

Example 4 RGM can Report on Methylation of Pluripotency SpecificSuperenhancers

Pluripotency master transcription factors, together with Mediator, havebeen shown to form superenhancers (SE) at key pluripotency genes (Dowenet al., 2014; Whyte et al., 2013). Comparing ChIP-seq and DNAmethylation data demonstrates that the enhancer marks of thepluripotent-specific SE miR290 and Sox2 are active and non-methylated inmESCs but methylated and not active in somatic cells (FIG. 3A and FIG.7A). We assessed whether RGM could be used for monitoringtissue-specific DNA methylation changes of miR290 and Sox2 SE regions.For this, we inserted, utilizing CRISP/Cas mediated gene editing, aSnrpn tdTomato reporter into the endogenous miR290 and Sox2superenhancers (FIG. 3B and FIG. 7B, respectively) using as recipientcells the previously established Oct4, Sox2, Klf4 and c-Myc (OSKM)polycistronic dox-inducible secondary reprogrammable mESCs (Carey etal., 2011; USSN), which also carried a GFP reporter knocked into theendogenous Nanog locus. Correct integration of the vector was validatedby PCR and Southern analysis (FIG. 7C). FIG. 3C shows that both targetedESC lines (miR290 #21 and Sox2 #2) expressed tdTomato as well asNanog-GFP. To assess whether the tdTomato expression correlated withhypomethylation of the inserted RGM, DNA extracted from the bulk mESCspopulation was bisulfite converted, amplified by PCR and sequenced, withthe PCR amplification including both the SE CpG region and thedownstream Snrpn promoter. As predicted from the methylation maps (FIG.3A and FIG. 7A), both endogenous miR290 and Sox2 CpG regions were mostlyhypomethylated (FIG. 3D). Importantly, the Snrpn promoter was alsohypomethylated (FIG. 3D) consistent with reporter expression. Of note, afew highly methylated alleles were detected (FIG. 3D), possiblyreflecting an inherent variation in the bulk population due to thepresence of cells that carry an inactive reporter. We conclude that RGMcan report on the methylation state of distal genomic regulatoryregions.

Example 5 Dynamic De Novo DNA Methylation During Differentiation

We investigated whether RGM also allows tracing of real-time changes ingenomic DNA methylation during in vitro differentiation. ESCs carryingthe tdTomato reporters reflecting DNA methylation levels at the SEregions, were exposed to Retinoic Acid (RA), which induces a rapid exitfrom pluripotency, and cellular differentiation (Rhinn and Dolle, 2012).The presence of the Nanog-GFP reporter allowed monitoring exit frompluripotency by loss of GFP expression. Sorted double positive(tdTomato⁺/GFP⁺) miR290 and Sox2 cells were plated on feeder-freegelatin coated plates, treated with 0.25 uM RA the following day (FIG.4A) and analyzed at different times after addition of RA (FIGS. 4A and4B). As expected, undifferentiated cells were double positive(tdTomato⁺/GFP⁺). Upon induction of differentiation a gradual reductionin the fraction of double positive cells was observed with mostdisappearing over the time course of 7 days, resulting in a largelydouble negative cell population (FIGS. 4B and 4C). However, tdTomato andNanog-GFP positive cells disappeared with different kinetics: whilesingly tdTomato positive cells (tdTomato⁺/GFP⁻) appeared after 2 days, afew if any single Nanog-GFP positive cells (tdTomato⁻/GFP⁺) weredetected during differentiation (FIGS. 4B and 4C). This suggested thatNanog was silenced prior to methylation and silencing of the miR290 andSox2 SEs.

To confirm that loss of the tdTomato signal correlated with accumulationof de novo methylation in both SE regions, we sorted the three mainpopulations following 48 hours of RA differentiation (FIG. 4C). DNA wasextracted from the three cell populations (tdTomato⁺/GFP⁺,tdTomato⁺/GFP⁻ and tdTomato⁻/GFP⁻) and subjected to bisulfitesequencing. FIGS. 4D and 4E show the methylation state of both theendogenous miR290 and Sox2 SE and their respective Snrpn promoterregions. In contrast to the bulk population of mESCs (FIG. 3D), thesorted double positive cells did not harbor completely methylatedalleles, consistent with the notion that methylated alleles in the bulkpopulation represent intrinsic variation. The methylation of both miR290(FIG. 4D) and Sox2 (FIG. 4E) in single positive cells (tdTomato⁺/GFP⁻)was low, consistent with tdTomato expression. The overall increased denovo methylation in the single positive cells, compared with the doublepositive cells, may suggest that this intermediate cell population isboth transient and unstable. Finally, in agreement with the silencing oftdTomato expression, the double negative cells (tdTomato⁻/GFP⁻)exhibited robust hypermethylation on both endogenous SE regions andtheir respective Snrpn promoters (FIGS. 4D and 4E). Our data suggestthat RGM can report on in vivo acquired methylation of genomic sequencesupon exiting from pluripotency, and that the differentiation of ESCsinduces silencing of Nanog prior to de novo methylation of the twomiR290 and Sox2 SEs.

To test whether in vivo differentiation resulted in silencing of thetdTomato reporter in both miR290 and Sox2 SE regions, we analyzed 13.5dpi chimeric embryos. As control, we injected ESCs harboring the GapdhCGI reporter driving a GFP sequence (FIG. 1A), which had also beeninfected with lentiviruses resulting in constitutive expression oftdTomato. The robust expression of GFP in the Gapdh control embryos,demonstrated the widespread expression signature of the Snrpn promoterthroughout mouse tissues (FIG. 5A). Unlike the Gapdh control, bothmiR290 and Sox2 embryos were completely negative for both GFP andtdTomato, demonstrating robust repression of Nanog and the Snrpnpromoter (respectively) during in vivo differentiation (FIG. 5A).

Example 6 DNA Demethylation During Cellular Reprogramming

Reprogramming of somatic cells to iPS cells involves demethylation andactivation of the pluripotency SEs Sox2 and miR290 (compare FIGS. 3A and7A). We investigated whether RGM could be used to capture demethylationevents that are gradually acquired during cellular reprogramming. Forthis we used secondary Dox-inducible reprogrammable MEFs isolated from13.5 dpi chimeric embryos that had been injected at the blastocyst stagewith the OSKM DOX inducible ESCs (Carey et al., 2011) carrying Nanog-GFPand the tdTomato reporter reflecting DNA methylation levels at the Sox2or miR290 SE alleles (see FIG. 5A). Culture of these MEFs in DOX inducesthe reprogramming factors while Nanog-GFP activation allows monitoringthe course of reprogramming in the bulk somatic cell population (Buganimet al., 2012). As expected, MEFs isolated from 13.5 dpi embryos werenegative for both GFP and tdTomato expression, as measured by FACSanalysis (FIG. 5C and FIG. 8A). Importantly, consistent with tdTomatorepression, both endogenous miR290 and Sox2 SE regions as well as theircorresponding downstream Snrpn promoter regions were hypermethylated(FIG. 5D).

To test whether reprogramming-induced demethylation can be visualized byRGM, we treated the secondary MEFs with serum and LIF mediumsupplemented with 2 ug/ml doxycycline (Dox). While both miR290 and Sox2MEFs were successfully reprogrammed, resulting in double positive cells(tdTomato⁺/GFP⁺, data not shown), the overall course of reprogrammingwas protracted and highly inefficient making it difficult to assess thereporter dynamics. It was recently shown that a combination of threechemicals, TGF-β antagonist ALK5 inhibitor II; GSK3b antagonistCHIR99021 and Ascorbic Acid, an enzymatic cofactor (from here onreferred to as 3C), results in more efficient and synchronousreprogramming (Vidal et al., 2014). We reprogrammed both miR290 and Sox2MEFs using 3C culture conditions and monitored the dynamics of reporteractivation by flow cytometry. While the first expression of tdTomato⁺and GFP⁺ cells emerged at day 16 (FIG. SE), reporter activation of bothmiR290 and Sox2 occurred with different kinetics. FIG. 5E showsaccumulation of miR290 reporter cells that activated both GFP andtdTomato (tdTomato⁺/GFP⁺) over time. A small population of singlepositive GFP cells appeared in late stages of reprogramming consistentwith a stochastic sequence of events in the reprogramming of the miR290SE region. Compared with miR290 reporter cells (i.e., cells bearing RGMin the miR290 SE), Sox2 cells (i.e., cells bearing RGM in the Sox2 SE)showed a more robust and defined dynamics of activation of bothreporters (GFP and tdTomato). By day 16 a population of single positiveGFP cells (tdTomato⁻/GFP⁺) had accumulated, which gradually shifted tobecome double positive (tdTomato⁺/GFP⁺) over time (FIG. 5E and FIG. 8B).

Our results suggest that reprogramming of both miR290 and Sox2 SEregions are late events, with the Sox2 SE region being reprogrammedsubsequently to the activation of endogenous Nanog. miR290 and Sox2double positive (tdTomato⁺/GFP⁺) cells invariably proceed to a Doxindependent iPS cell state (FIG. 5F). To assess the methylation state ofthe Sox2 and miR290 SEs, we performed bisulfite sequencing on DNAextracted from sorted double positive (tdTomato⁺/GFP⁺) iPS cells. Asshown in FIG. 5G, both miR290 and Sox2 SE regions, and theircorresponding downstream Snrpn promoters were demethylated. Theseresults suggest that RGM can faithfully visualize demethylation ofregulatory genomic regions during reprogramming at single cellresolution.

DISCUSSION

We have generated a DNA methylation reporter (RGM) that allows real timeimaging of DNA methylation with single cell resolution. The design ofthe reporter system took advantage of the intrinsic characteristics ofimprinted gene promoters, for which the transcriptional activityreflects the DNA methylation state of adjacent sequences. Importantly,imprinted promoters are neutral to developmental or tissue specific DNAmethylation changes, with their activity strictly dependent on themethylation state of the adjacent regulatory elements. This is incontrast to CGI sequences such as Gapdh or tissue-specific elements suchas the Dazl promoter associated sequences, which become demethylated orde novo methylated, respectively, when inserted into the genome of ESCs(Brandeis et al., 1994; Sabag et al., 2014). This indicates thatmethylation of these elements as opposed to imprinted promoters issequence—dependent and subject to trans-acting signals and cellstate-dependent regulation.

The RGM reporter system described here is based on the Snrpn minimalpromoter that does not induce methylation changes by itself but drivesGFP expression solely dependent on the methylation state of surroundingsequences. Consistent with this premise, ES cells appeared GFP positivewhen stably transfected with the methylated or unmethylatedGapdh/Snrpn-GFP vector, but were GFP negative when transfected with themethylated or unmethylated Dazl/Snrpn-GFP reporter. This indicates thatthe Snrpn promoter region can be used as a faithful sensor for regionalmethylation changes of adjacent sequences.

To investigate whether RGM can report on the methylation state ofendogenous loci we chose two pluripotent-specific SEs that are upstreamof the miR290 and Sox2 genes, and that are known to be active andunmethylated in ESCs but become methylated and inactive upon cellulardifferentiation. CRISPR/Cas mediated insertion of the Snrpn-tdTomatoreporter into ESCs resulted in tdTomato positive clones but tdTomatoexpression was silenced in mid-gestation chimeric embryos, whichreflects the demethylation state of the SEs in pluripotent cells andtheir de novo methylation upon induction of differentiation. Conversely,MEFs isolated from chimeric embryos were tdTomato negative with bothelements highly methylated. Upon conversion of the MEFs into iPSCs,however, the cells became tdTomato positive reflecting demethylation ofthe SEs during reprogramming to pluripotency. Our results establish thatRGM reporter activity faithfully mirrors the changes of DNA methylationimposed on endogenous genomic elements during development, upon cellulardifferentiation and during reprogramming.

Changes in DNA methylation during development, lineage commitment anddisease are dynamic and studies of epigenetic changes have been hamperedby two experimental constraints that limit mechanistic studies ofmethylation and gene regulation. (i) One limitation of currentmethodology (standard methods for methylation analysis used in the artprior to the present disclosure) is that it provides only a static“snapshot” view of the methylation state during cell state transitionsand (ii) another restriction is that current methylation analysesrequire the examination of multiple cells precluding assessment ofepigenetic changes in single cells. Given the overwhelming evidence ofcell-cell heterogeneity in embryos, cultured cells or disease statessuch as cancer (Junker and van Oudenaarden, 2014), this is a seriouslimitation for a mechanistic understanding of the epigenetic state andgene expression during these complex processes. The RGM reporter systemovercomes some of the limitations of conventional methylation analysesby providing real time visualization of DNA methylation at single cellresolution.

Reprogramming of somatic cells into iPSCs involves extensive resettingof the epigenome (Buganim et al., 2013; Hanna et al., 2010), andcoinciding with this notion, recent studies identified key role forepigenetic modifiers during this process (Mansour et al., 2012; Rais etal., 2013; Soufi et al., 2012). However, the exact kinetics of theseepigenetic changes during the reprogramming process are difficult todefine because of cell heterogeneity and the stochastic nature of thereprogramming process. Here we followed the methylation changes of twoSEs associated with Sox2 and mir290, both of which are methylated andinactive in somatic cells but are unmethylated and activated in iPS andES cells. Utilizing RGM we show that demethylation of both miR290 andSox2 SEs are late events in the reprogramming process. Simultaneousactivation of endogenous Nanog and miR290 SE demethylation, isconsistent with Nanog directly regulating the expression of miR290cluster during reprogramming to iPS cells (Gingold et al., 2014). Thegradual activation of the Sox2 tdTomato reporter followed expression ofendogenous Nanog, consistent with demethylation of Sox2 SE being a lateevent in the process (Buganim et al., 2012).

As RGM allows measuring dynamics of DNA methylation at single-cellresolution, it provides a framework for understanding epigenetic changesduring cell state transition in heterogeneous cell populations. Forexample, replacing the fluorescent protein in the reporter system withCre-Lox will enable the generation of epigenetic lineage tracing maps.Furthermore, utilizing RGM together with conventional gene expressionreporters may offer detailed insights into the interplay betweenepigenetic cues and the execution of tissue-specific gene expressionprograms. The use of fluorescent reporters (or other reporters) asreadout for locus-specific methylation changes may also provide aneffective screening platform for the isolation of small moleculecompounds that affect the methylation state of specific genomic regions.

Materials and Methods

mESCs Cell Culture

V6.5 mouse embryonic stem cells (mESCs) were cultured on irradiatedmouse embryonic fibroblasts (MEFs) with standard ESCs medium: (500 ml)DMEM supplemented with 10% FBS (Hyclone), 10 ug recombinant leukemiainhibitory factor (LIF), 0.1 mM beta-mercaptoethanol (Sigma-Aldrich),penicillin/streptomycin, 1 mM L-glutamine and 1% nonessential aminoacids (all from Invitrogen). For experiments in 2i culture conditions,mESCs were cultured on gelatin-coated plates with N2B27+2i+LIF mediumcontaining: (500 ml), 240 ml DMEM/F12 (Invitrogen; 11320), 240 mlNeurobasal media (Invitrogen; 21103), 5 ml N2 supplement (Invitrogen;Ser. No. 17/502,048), 10 ml B27 supplement (Invitrogen; Ser. No.17/504,044), 10 ug recombinant LIF, 0.1 mM beta- mercaptoethanol (SigmaAldrich), penicillin/streptomycin, 1 mM L-glutamine and 1% nonessentialamino acids (all from Invitrogen), 50 ug/ml BSA (Sigma), PD0325901(Stemgent, 1 uM), CHIR99021 (Stemgent, 3 uM).

Plasmid Cloning

To clone thePiggyBac-Insulator-GapdhCGI-Snrpn-GFP-polyA-PGK-PURO-sv40PolyA-Insulatorconstruct, the minimal Snrpn promoter was PCR amplified using primers A1and A2 (see complete primer list below). Snrpn PCR fragment wassubsequently digested using Mfe1 and Nhe1 restriction enzymes. GapdhCGIsequence was PCR amplified using primers A3 and A4, following digestionusing Sbf1 and Mfe1. A pCR2.1-TOPO-TA cloning vector (Life technologies)vector containing a GFP-PolyA-PGK-Puro cassette was digested using Sbf1and Nhe1. Subsequently, these 3 DNA fragments were cloned usingthree-way ligation. The resulting GapdhCGI-Snrpn-GFP-PolyA-PGK-Purocassette was then cloned into a PiggyBac transposon using therestriction enzymes Sbf1 and SacII to generate thePiggyBac-Insulator-GapdhCGI-Snrpn-GFP-polyA-PGK-PURO-sv40PolyA-Insulatorvector. For thePiggyBac-Insulator-DazlCGI-Snrpn-GFP-polyA-PGK-PURO-sv40PolyA-Insulatorconstruct, the same method was used, except that DazlCGI DNA fragmentwas PCR amplified using primers A5 and A6.

To clone the mi290 super enhancer (SE) targeting vector, the 5′ homologyarm was PCR amplified using the primers B1 and B2, this DNA fragment wasthen digested using Sbf1 and Mfe1 restriction enzymes. The 3′ homologyarm was PCR amplified using the Primers B3 and B4, following digestionwith Asc1 and Fse1 restriction enzymes. Both homology arms weresubsequently ligated with Snrpn-tdTomato-PolyA-PGK-Puro fragment thathad been digested with Nhe1 and Asc1 restriction enzymes, and apCR2.1-TOPO-TA cloning vector (Life Technologies) backbone that had beendigested with Sbf1 and Fse1. To clone the Sox2 SE targeting vector, thesame method was used except that 5′ homology arm was amplified usingprimers C1 and C2, and the 3′ homology arm was amplified using primersC3 and C4.

CRISPR oligonucleotides were ligated into px330 vector using BbsIrestriction site as previously described (Wang et al., 2013). For themi290 SE region oligonucleotides D3 and D4 were used and for the Sox2 SEregion, the oligonucleotides D1 and D2 were used (see complete primerlist below).

Reporter Cell Lines

To generate stably integrated Gapdh and Dazl reporter cell lines, eitherGapdh- or Dazl-modified PiggyBac transposon (see above), and a helperplasmid expressing transposase, were transfected into mESCs cells usingXfect mESC Transfection Reagent (Clontech), according to the provider'sprotocol. Stably integrated reporter cells were selected with puromycin(2 mg/ml) for four days.

To generate miR290 and Sox2 SE reporter cell lines, targeting vectorsand CRISPR/Cas9 were transfected into mESCs using Xfect mESCTransfection Reagent (Clontech), according to the provider's protocol.48 hours following transfection, cells were FACS sorted for tdTomatoexpression, and plated on MEF feeder plates. Single colonies werefurther analyzed for proper integration by southern blot and PCRanalysis.

Flow Cytometry

To assess the proportion of GFP and tdTomato in the established reportercell lines, a single cell suspension was filtered, and assessed on theLSR II SORP, LSRFortessa SORP or FACSCanto II.

Retinoic Acid-Induced Differentiation

mESCs carrying the reporter for both miR290 and Sox2 SE region, weresorted for double positive GFP and tdTomato expression, and plated ongelatin coated plates in ES cell medium (+LIF). The next day, cells werewashed with PBS and resuspended in basal N2B27 medium (2i medium withoutLIF, Insulin and the two inhibitors), supplemented with 0.25 uM retinoicacid (RA). Medium was replaced every other day.

Blastocyst Injections for the Generation of Chimeras and Secondary MEFs

Blastocyst injections were performed using (C57B1/6×DBA) B6D2F2 hostembryos. In brief, B6D2F1 females were hormone primed by an i.p.injection of PMS (Pregnant Mare Serum Gonadotropin, EMD Millipore)followed 46 h later by an injection of hCG (human ChorionicGonadrotropin, VWR). Embryos were harvested at the morula stage andcultured in a CO₂ incubator overnight. On the day of the injection,groups of embryos were placed in drops of M2 medium and using a 16 umdiameter injection pipet (Origio, Inc.) approximately 10 cells wereinjected into the blastocoel cavity of each embryo using a Piezomicromanipulator (Prime Tech, Ltd). About 20 blastocysts weresubsequently transferred to each recipient female; the day of injectionwas considered as 2.5 dpc. Fetuses were collected at 13.5 dpc for theextraction of embryonic fibroblasts as described before (Buganim et al.,2012).

Southern Blots

10-15 ug of genomic DNA was digested with appropriate restrictionenzymes overnight. Subsequently, genomic DNA was separated on a 0.7%agarose gel, transferred to a nylon membrane (Amersham) and hybridizedwith ³²P random primer (Stratagene) labeled probes.

Reprogramming to iPSCs

MEFs isolated from miR290 and Sox2 fetuses, were plated at density of50,000 cells per 6-well in gelatin coated plates with standard MEFmedium (mESCs media without LIF). The following day MEF medium wasreplaced with mESCs medium containing 2 mg/ml doxycycline (Sigma).Alternatively, cells were grown in mESCs medium containing 2 mg/mldoxycycline and a combination of 3 compounds: TGF-3 antagonist ALK5inhibitor II; GSK3b antagonist CHIR99021 and Ascorbic Acid, as describedbefore (Vidal et al., 2014). Medium was replaced every other day duringthe course of reprogramming.

Bisulfite Conversion, PCR and Sequencing

Bisulfite conversion of DNA was established using the EpiTect BisulfiteKit (Qiagen) following the manufacturer's instructions. The resultingmodified DNA was amplified by first round of nested PCR, following asecond round using loci specific PCR primers (see complete list ofprimers below). The first round of nested PCR was done as follows: 94°C. for 4 min; 55° C. for 2 min; 72° C. for 2 min; Repeat steps 1-3 1×;94° C. for 1 min; 55° C. for 2 min; 72° C. for 2 min; Repeat steps 5-735×; 72° C. for 5 min; Hold 12° C. The second round of PCR was asfollowed: 95° C. for 4 min; 94° C. for 1 min; 55° C. for 2 min; 72° C.for 2 min; Repeat steps 2-4 35×; 72° C. for 5 min; Hold 12° C. Theresulting amplified products were gel-purified, subcloned into ApCR2.1-TOPO-TA cloning vector (Life Technologies), and sequenced.

Primer List—Cloning

A1 snrpnF-mfe aattaacaattgACGCTCAAATTTCCGC AGTAGG (SEQ ID NO: 8)A2 snrpnR-nhe aattaaGCTAGCAGAATCCACAAGCCCA GCTG (SEQ ID NO: 9)A3 gapdhF-sbf AATTAACCTGCAGGAGCCGAGAGGAATG AGGTTAGTC (SEQ ID NO: 10)A4 gapdhR-mfe AATTAACAATTGGAGAGAGGCCCAGCTA CTCG (SEQ ID NO: 11)A5 dazlF-sbf AATTAACCTGCAGGTTATGCCCTCTCCC CACTTCTC (SEQ ID NO: 12)A6 dazlR-mfe AATTAACAATTGCCAAGCACCCTACAGC TCG (SEQ ID NO: 13)B1 miR290-5F AATTAACCTGCAGGGATACTGTGTCTTG GGGAGAAAGC (SEQ ID NO: 14)B2 miR290-5R AATTAACAATTGATACGGGAAGGAGTGC CGGG (SEQ ID NO: 15)B3 miR290-3F AATTAAGGCGCGCCCAGCTCTGAAATCT GCAGAGCTG (SEQ ID NO: 16)B4 miR790-3R AATTAAGGCCGGCCGGCATTTGCCACTA TGCCTGC (SEQ ID NO: 17)C1 Sox2-5F AATTAACCTGCAGGCCGGGGTTTCCTGA TCTCTTGC (SEQ ID NO: 18)C2 Sox2-5R AATTAACAATTGTCTGGCTCGGAAAGCT GGG (SEQ ID NO: 19) C3 Sox2-3FAATTAAGGCGCGCCGGAGGGGGCTGCAT TCTCAG (SEQ ID NO: 20) C4 Sox2-3RAATTAAGGCCGGCCGCTACGAAACAGGT TCGAGACC (SEQ ID NO: 21) D1 SOX2-SE CR42CACCGCCAGCTTTCCGAGCCAGATG (SEQ ID NO: 22) D2 SOX2-SE CR42AAACCATCTGGCTCGGAAAGCTGGC (SEQ ID NO: 23) D3 miR290-EN2 CR43CACCGCAGATTTCAGAGCTGATAC (SEQ ID NO: 24) D4 miR290-EN2 CR43AAACGTATCAGCTCTGAAATCTGC (SEQ ID NO: 25)

Primer List—Bisulfite

GFP Nested R CTCGACCAAAATAAACACCACC CC (SEQ ID NO: 26) Dazl Nested FCGATTAGAGAGTAGGTTTTGTT TGG (SEQ ID NO: 27) Dazl F TTGAGTTCGGGTGTATGTGGAAGG (SEQ ID NO: 28) Dazl R CGTCAATTACCAAACACCCTAC AAC (SEQ ID NO: 29)Dazl-Snrpn F CGAGTTGTAGGGTGTTTGGTAA TTG (SEQ ID NO: 30) Dazl-Snrpn RACGTTACAAATCACTCCTCAAA ACC (SEQ ID NO: 31) Gapdh Nested FGGTTGTAGGAGAAGAAAATGAG ATTAG (SEQ ID NO: 32) Gapdh FGGTTGTAGGAGAAGAAAATGAG ATTAG (SEQ ID NO: 33) Gapdh RACGTCAATTAAAAAAAAACCCA ACTAC (SEQ ID NO: 34) Gapdh-Snrpn FTAGTTTAAGGGCGTAGAGGTTT GAG (SEQ ID NO: 35) Gapdh-Snrpn RACGTTACAAATCACTCCTCAAA ACC (SEQ ID NO: 36) miR290 Nested FGAGGGGATTTTTTGGGGTAGAG (SEQ ID NO: 37) miR290 Nested RCCCTTACTCACCATACTAACAA AATCC (SEQ ID NO: 38) miR290-Snrpn FGATTTTTTGGGGTAGAGGTAGG TGTG (SEQ ID NO: 39) miR290-Snrpn RCCACAAACCCAACTAACCTTCC TC (SEQ ID NO: 40) Sox2 Nested FGTGGTTGTTGTGTTTAGTATGT GGG (SEQ ID NO: 41) Sox2 Nested RCCCTTACTCACCATACTAACAA AATCC (SEQ ID NO: 42) Sox2-Snrpn FGGTTGTTGTGTTTAGTATGTGG GTT (SEQ ID NO: 43) Sox2-Snrpn RCCACAAACCCAACTAACCTTCC (SEQ ID NO: 44)

REFERENCE LIST

-   Bird, A. (2002). DNA methylation patterns and epigenetic memory.    Genes & development 16, 6-21.-   Brandeis, M., Frank, D., Keshet, I., Siegfried, Z., Mendelsohn, M.,    Nemes, A., Temper, V., Razin, A., and Cedar, H. (1994). Sp1 elements    protect a CpG island from de novo methylation. Nature 371, 435-438.-   Buganim, Y., Faddah, D. A., Cheng, A. W., Itskovich, E., Markoulaki,    S., Ganz, K., Klemm, S. L., van Oudenaarden, A., and Jaenisch, R.    (2012). Single-cell expression analyses during cellular    reprogramming reveal an early stochastic and a late hierarchic    phase. Cell 150, 1209-1222.-   Buganim, Y., Faddah, D. A., and Jaenisch, R. (2013). Mechanisms and    models of somatic cell reprogramming. Nature reviews Genetics 14,    427-439.-   Buiting, K., Saitoh, S., Gross, S., Dittrich, B., Schwartz, S.,    Nicholls, R. D., and Horsthemke, B. (1995). Inherited microdeletions    in the Angelman and Prader-Willi syndromes define an imprinting    centre on human chromosome 15. Nature genetics 9, 395-400.-   Carey, B. W., Markoulaki, S., Hanna, J. H., Faddah, D. A., Buganim,    Y., Kim, J., Ganz, K., Steine, E. J., Cassady, J. P., Creyghton, M.    P., et al. (2011). Reprogramming factor stoichiometry influences the    epigenetic state and biological properties of induced pluripotent    stem cells. Cell stem cell 9, 588-598.-   Cedar, H., and Bergman, Y. (2012). Programming of DNA methylation    patterns. Annual review of biochemistry 81, 97-117.-   Deaton, A. M., and Bird, A. (2011). CpG islands and the regulation    of transcription. Genes & development 25, 1010-1022.-   Dowen, J. M., Fan, Z. P., Hnisz, D., Ren, G., Abraham, B. J.,    Zhang, L. N., Weintraub, A. S., Schuijers, J., Lee, T. I., Zhao, K.,    et al. (2014). Control of cell identity genes occurs in insulated    neighborhoods in Mammalian chromosomes. Cell 159, 374-387.-   Ferguson-Smith, A. C. (2011). Genomic imprinting: the emergence of    an epigenetic paradigm. Nature reviews Genetics 12, 565-575.-   Gingold, J. A., Fidalgo, M., Guallar, D., Lau, Z., Sun, Z., Zhou,    H., Faiola, F., Huang, X., Lee, D. F., Waghray, A., et al. (2014). A    genome-wide RNAi screen identifies opposing functions of Snai1 and    Snai2 on the Nanog dependency in reprogramming. Molecular cell 56,    140-152.-   Hackett, J. A., Sengupta, R., Zylicz, J. J., Murakami, K., Lee, C.,    Down, T. A., and Surani, M. A. (2013). Germline DNA demethylation    dynamics and imprint erasure through 5-hydroxymethylcytosine.    Science 339, 448-452.-   Hanna, J. H., Saha, K., and Jaenisch, R. (2010). Pluripotency and    cellular reprogramming: facts, hypotheses, unresolved issues. Cell    143, 508-525.-   Hnisz, D., Abraham, B. J., Lee, T. I., Lau, A., Saint-Andre, V.,    Sigova, A. A., Hoke, H. A., and Young, R. A. (2013). Super-enhancers    in the control of cell identity and disease. Cell 155, 934-947.-   Hon, G. C., Rajagopal, N., Shen, Y., McCleary, D. F., Yue, F.,    Dang, M. D., and Ren, B. (2013). Epigenetic memory at embryonic    enhancers identified in DNA methylation maps from adult mouse    tissues. Nature genetics 45, 1198-1206.-   Irizarry, R. A., Ladd-Acosta, C., Wen, B., Wu, Z., Montano, C.,    Onyango, P., Cui, H., Gabo, K., Rongione, M., Webster, M., et al.    (2009). The human colon cancer methylome shows similar hypo- and    hypermethylation at conserved tissue-specific CpG island shores.    Nature genetics 41, 178-186.-   Ivics, Z., Hackett, P. B., Plasterk, R. H., and Izsvak, Z. (1997).    Molecular reconstruction of Sleeping Beauty, a Tc1-like transposon    from fish, and its transposition in human cells. Cell 91, 501-510.-   Jaenisch, R., and Bird, A. (2003). Epigenetic regulation of gene    expression: how the genome integrates intrinsic and environmental    signals. Nature genetics 33 Suppl, 245-254.-   Jones, P. A. (2012). Functions of DNA methylation: islands, start    sites, gene bodies and beyond. Nature reviews Genetics 13, 484-492.-   Junker, J. P., and van Oudenaarden, A. (2014). Every cell is    special: genome-wide studies add a new dimension to single-cell    biology. Cell 157, 8-11.-   Kantor, B., Kaufman, Y., Makedonski, K., Razin, A., and Shemer, R.    (2004). Establishing the epigenetic status of the    Prader-Willi/Angelman imprinting center in the gametes and embryo.    Human molecular genetics 13, 2767-2779.-   Lee, H. J., Hore, T. A., and Reik, W. (2014). Reprogramming the    methylome: erasing memory and creating diversity. Cell stem cell 14,    710-719.-   Li, E., Bestor, T. H., and Jaenisch, R. (1992). Targeted mutation of    the DNA methyltransferase gene results in embryonic lethality. Cell    69, 915-926.-   Mansour, A. A., Gafni, O., Weinberger, L., Zviran, A., Ayyash, M.,    Rais, Y., Krupalnik, V., Zerbib, M., Amann-Zalcenstein, D., Maza,    I., et al. (2012). The H3K27 demethylase Utx regulates somatic and    germ cell epigenetic reprogramming. Nature 488, 409-413.-   Mummaneni, P., Walker, K. A., Bishop, P. L., and Turker, M. S.    (1995). Epigenetic gene inactivation induced by a cis-acting    methylation center. The Journal of biological chemistry 270,    788-792.-   Pawlak, M., and Jaenisch, R. (2011). De novo DNA methylation by    Dnmt3a and Dnmt3b is dispensable for nuclear reprogramming of    somatic cells to a pluripotent state. Genes & development 25,    1035-1040.-   Rais, Y., Zviran, A., Geula, S., Gafni, O., Chomsky, E., Viukov, S.,    Mansour, A. A., Caspi, I., Krupalnik, V., Zerbib, M., et al. (2013).    Deterministic direct reprogramming of somatic cells to pluripotency.    Nature 502, 65-70.-   Reik, W., Dean, W., and Walter, J. (2001). Epigenetic reprogramming    in mammalian development. Science 293, 1089-1093.-   Rhinn, M., and Dolle, P. (2012). Retinoic acid signalling during    development. Development 139, 843-858.-   Rivera, C. M., and Ren, B. (2013). Mapping human epigenomes. Cell    155, 39-55.-   Sabag, O., Zamir, A., Keshet, I., Hecht, M., Ludwig, G., Tabib, A.,    Moss, J., and Cedar, H. (2014). Establishment of methylation    patterns in ES cells. Nature structural & molecular biology 21,    110-112.-   Smith, Z. D., Chan, M. M., Humm, K. C., Karnik, R., Mekhoubad, S.,    Regev, A., Eggan, K., and Meissner, A. (2014). DNA methylation    dynamics of the human preimplantation embryo. Nature 511, 611-615.-   Smith, Z. D., and Meissner, A. (2013). DNA methylation: roles in    mammalian development. Nature reviews Genetics 14, 204-220.-   Soufi, A., Donahue, G., and Zaret, K. S. (2012). Facilitators and    impediments of the pluripotency reprogramming factors' initial    engagement with the genome. Cell 151, 994-1004.-   Stadler, M. B., Murr, R., Burger, L., Ivanek, R., Lienert, F.,    Scholer, A., van Nimwegen, E., Wirbelauer, C., Oakeley, E. J.,    Gaidatzis, D., et al. (2011). DNA-binding factors shape the mouse    methylome at distal regulatory regions. Nature 480, 490-495.-   Turker, M. S. (2002). Gene silencing in mammalian cells and the    spread of DNA methylation. Oncogene 21, 5388-5393.-   Vidal, S. E., Amlani, B., Chen, T., Tsirigos, A., and Stadtfeld, M.    (2014). Combinatorial Modulation of Signaling Pathways Reveals    Cell-Type-Specific Requirements for Highly Efficient and Synchronous    iPSC Reprogramming. Stem cell reports 3, 574-584.-   Wang, H., Yang, H., Shivalila, C. S., Dawlaty, M. M., Cheng, A. W.,    Zhang, F., and Jaenisch, R. (2013). One-step generation of mice    carrying mutations in multiple genes by CRISPR/Cas-mediated genome    engineering. Cell 153, 910-918.-   Whyte, W. A., Orlando, D. A., Hnisz, D., Abraham, B. J., Lin, C. Y.,    Kagey, M. H., Rahl, P. B., Lee, T. I., and Young, R. A. (2013).    Master transcription factors and mediator establish super-enhancers    at key cell identity genes. Cell 153, 307-319.-   Xie, W., Schultz, M. D., Lister, R., Hou, Z., Rajagopal, N., Ray,    P., Whitaker, J. W., Tian, S., Hawkins, R. D., Leung, D., et al.    (2013). Epigenomic analysis of multilineage differentiation of human    embryonic stem cells. Cell 153, 1134-1148.-   Ziller, M. J., Gu, H., Muller, F., Donaghey, J., Tsai, L. T.,    Kohlbacher, O., De Jager, P. L., Rosen, E. D., Bennett, D. A.,    Bernstein, B. E., et al. (2013). Charting a dynamic DNA methylation    landscape of the human genome. Nature 500, 477-481.

1. A nucleic acid comprising: (i) a mammalian imprinted gene promoter;and (ii) a sequence that encodes a reporter molecule that is detectablein individual mammalian cells, wherein the promoter is operably linkedto the sequence that encodes the reporter molecule.
 2. The nucleic acidof claim 1, wherein the mammalian imprinted gene promoter comprises atleast a portion of a parent-of-origin differentially methylated region(DMR).
 3. The nucleic acid of claim 1, further comprising a firsthomology arm located 5′ from the promoter and a second homology armlocated 3′ from the sequence that encodes a reporter molecule, whereinthe homology arms comprise sequences that are homologous to sequencesthat flank a target location in a mammalian genome. 4.-5. (canceled) 6.The nucleic acid of claim 3, wherein the target location is in proximityto an enhancer, superenhancer, promoter, gene body, CpG island, or lowCpG region. 7.-10. (canceled)
 11. The nucleic acid of claim 1, whereinthe imprinted gene promoter is from the Snrpn gene.
 12. The nucleic acidof claim 1, wherein the sequence of the promoter comprises SEQ ID NO: 1or SEQ ID NO:
 2. 13. The nucleic acid of claim 1 wherein the reportermolecule comprises a fluorescent protein or a luciferase. 14.-23.(canceled)
 24. A cell comprising the nucleic acid or vector of claim 3.25. (canceled)
 26. A cell comprising a nucleic acid comprising (i) amammalian imprinted gene promoter; and (ii) a sequence that encodes areporter molecule, wherein the promoter is operably linked to thesequence that encodes the reporter molecule, and wherein the nucleicacid is integrated into the genome of the cell. 27.-29. (canceled) 30.The cell of claim 26, wherein the imprinted gene promoter is from theSnrpn gene.
 31. (canceled)
 32. The cell of claim 26 wherein the reportermolecule is detectable in individual cells.
 33. The cell of claim 26wherein the reporter molecule comprises a fluorescent protein or aluciferase. 34.-47. (canceled)
 48. The cell of claim 26, wherein thecell is a mammalian cell, and wherein the genomic DNA of the cellcomprises at least one region with aberrant DNA methylation. 49.-84.(canceled)
 85. A method of detecting the methylation state of a DNAregion of interest in the genome of a cell comprising: (a) providing oneor more cells of claim 26, wherein the nucleic acid is integrated inproximity to a region of interest in the genome of the cell; and (b)measuring expression of the reporter molecule by the one or more cells,wherein the level of expression of the reporter molecule is indicativeof the level of methylation of the region of interest, thereby detectingthe methylation state of the region of interest.
 86. The method of claim85, wherein expression of the reporter molecule is indicative ofhypomethylation of the DNA region of interest and lack of expression ofthe reporter molecule is indicative of hypermethylation of the DNAregion of interest. 87.-101. (canceled)
 102. A method of monitoring themethylation state of a region of interest in a cell over a period oftime comprising steps of: (a) providing one or more cells of claim 26,wherein the nucleic acid is integrated in proximity to a region ofinterest in the genome of the cell; and (b) measuring expression of thereporter molecule by the one or more cells at two or more time points,wherein the level of expression of the reporter molecule is indicativeof the level of methylation of the region of interest, therebymonitoring the methylation state of the region of interest over a periodof time. 103.-113. (canceled)
 114. The method of claim 102, wherein themethod comprises: exposing the cell to an agent or condition ofinterest; measuring expression of the reporter molecule at two or moretime points; comparing the level of expression of the reporter moleculebetween two or more of the time points, wherein a difference in thelevel of the reporter molecule between at least two of the time pointsindicates that the agent or condition affects methylation of the regionof interest. 115.-121. (canceled)
 122. A method of evaluating the effectof an agent on the methylation state of a DNA region of interest in acell comprising steps of: contacting one or more cells of claim 26 witha test agent; measuring expression of the reporter molecule; andcomparing the level of expression of the reporter molecule with acontrol value, wherein a difference between the measured value and thecontrol value indicates that the test agent modulates the methylationstate of the region of interest. 123.-124. (canceled)
 125. The method ofclaim 122, wherein the method comprises detecting an increase in thelevel of expression of the reporter molecule as compared to the controlvalue, thereby determining that the agent decreases methylation of theregion of interest.
 126. The method of claim 122, wherein the methodcomprises detecting a decrease in the level of expression of thereporter molecule as compared to the control value, thereby determiningthat the agent increases DNA methylation of the region of interest.127.-137. (canceled)